[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113962364B - A Multi-factor Power Load Forecasting Method Based on Deep Learning - Google Patents

A Multi-factor Power Load Forecasting Method Based on Deep Learning Download PDF

Info

Publication number
CN113962364B
CN113962364B CN202111232185.5A CN202111232185A CN113962364B CN 113962364 B CN113962364 B CN 113962364B CN 202111232185 A CN202111232185 A CN 202111232185A CN 113962364 B CN113962364 B CN 113962364B
Authority
CN
China
Prior art keywords
data
lstm
power
sequence
load data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111232185.5A
Other languages
Chinese (zh)
Other versions
CN113962364A (en
Inventor
朱敏
明章强
闫建荣
张万利
赵志龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202111232185.5A priority Critical patent/CN113962364B/en
Publication of CN113962364A publication Critical patent/CN113962364A/en
Application granted granted Critical
Publication of CN113962364B publication Critical patent/CN113962364B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for AC mains or AC distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P80/00Climate change mitigation technologies for sector-wide applications
    • Y02P80/20Climate change mitigation technologies for sector-wide applications using renewable energy
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Power Engineering (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a multi-factor power load forecasting method based on deep learning, which comprises the steps of firstly completing the acquisition and storage of data, including power load data and environmental influence data; preprocessing and standardizing data based on abnormal data detection, autoregressive interpolation and sequence data normalization of a k-proximity algorithm and an improved DBSCAN algorithm; then, an improved CNN-LSTM electrical load prediction model is provided, and a CNN characteristic extraction module is adopted to learn local characteristics of input data; then inputting the data into an LSTM sequence learning model, and extracting sequence characteristic information of input data; meanwhile, introducing a self-attribute mechanism into the LSTM for learning the characteristics of the LSTM hidden layer, and realizing the extraction of key characteristics by distributing different attention weights so as to improve the final prediction precision; and finally, predicting the electrical load. The invention can promote the digital upgrading of the power grid, meet the individual requirements of users, and realize the association analysis of industry, the dispatching of power generation, the prediction of power utilization trend, the guidance of repeated work and production and the like.

Description

一种基于深度学习的多因素用电负荷预测方法A multi-factor electricity load forecasting method based on deep learning

技术领域Technical Field

本发明涉及人工智能与用电负荷预测交叉领域,更具体的为一种基于深度学习的多因素用户用电负荷预测方法。The present invention relates to the intersection of artificial intelligence and power load forecasting, and more specifically to a multi-factor user power load forecasting method based on deep learning.

背景技术Background Art

近年来,随着国民经济高速高质量发展,人民生活水平不断提高,对电力的需求也在不断增加。国家也加快对电力项目的部署,以保证电力供应的充足。但由于相关发电项目的发电环节落后的原因,电力部门决策者不能准确把握电网负荷变化,会出现决策失误等问题,不能维持合理的电力供需关系,造成大量电力资源的浪费。对用户用电负荷进行预测具有以下意义:(1)挖掘用户用电模式,分析用户用电信息,挖掘用户用电规律,结合地域、时间等因素,进一步探索用户用电模式与出现差异的原因。针对电网用电负荷的预测结果,便于电力部门做好电力资源调度,并在不同区域,不同用户间提供个性化服务;(2)地理尺度指导电力输送,在地理尺度上对电网用电负荷的预测,可以结合地域间发电量的差异,便于电力部门进行电力资源的合理调度与输送;(3)辅助实现可视化交互系统,对用电负荷预测结果构建基于电力数据的预测及可视化分析模型,辅以分析比较效果较好的预测方法,实现电网用电负荷预测系统,从而可以生动地展现不同地域,不同时间段、不同类别电力用户的预测用电负荷,用电负荷的一系列影响因素也能在可视化系统予以体现。In recent years, with the rapid and high-quality development of the national economy, people's living standards have continued to improve, and the demand for electricity has also continued to increase. The country has also accelerated the deployment of power projects to ensure sufficient power supply. However, due to the backwardness of the power generation link of related power generation projects, decision makers in the power sector cannot accurately grasp the changes in power grid load, which will lead to problems such as decision-making errors, and cannot maintain a reasonable power supply and demand relationship, resulting in a large amount of power resources. Predicting user power load has the following significance: (1) Mining user power consumption patterns, analyzing user power consumption information, mining user power consumption patterns, and combining factors such as region and time to further explore user power consumption patterns and the reasons for differences. The prediction results of power grid load can help the power department to dispatch power resources and provide personalized services to different users in different regions; (2) The geographical scale guides power transmission. The prediction of power grid load on the geographical scale can be combined with the differences in power generation between regions to facilitate the power department to reasonably dispatch and transmit power resources; (3) Assist in the realization of a visual interactive system, build a prediction and visualization analysis model based on power data for the power load prediction results, and use a prediction method with good analysis and comparison effects to realize the power grid load prediction system, so as to vividly display the predicted power load of different regions, different time periods, and different types of power users. A series of influencing factors of power load can also be reflected in the visualization system.

电网负荷预测系统考虑到用户,地域,时间、环境、气候等影响因素,通过构建训练模型,实现对未来某时刻或时段的电力负荷预测。此前一些方法通过对逻辑斯蒂生长曲线模型,多元线性回归分析模型,灰色预测模型,神经网络模型进行仿真,建立预测模型,并对哈尔滨地区供电负荷与用电量进行预测,通过对比分析确定一种精度较高的方法。一些方法基于BP神经网络,考虑温度等多因素的影响,得到了准确度较高的输出结果。有研究者提出一种Lasso-PCA数据简约与特征提取模型用来减少模型的计算量和参数量,提高模型的执行效率,并创新性地运用改进自适应遗传算法优化BP神经网络训练过程,分析论证得出该方法具有更高对电力负荷的预测精度。针对单一尺度研究的缺陷,一种基于时序分解的后向传播算法的循环神经网络预测模型,可以综合多方面因素的影响建立预测模型,对用户未来时段的用电量进行预测。也有研究者提出一种基于水波优化算法改进径向基神经网络的用电预测模型,解决在基于用户层级情况下用电负荷预测不准确的问题。通过对RBF神经网络隐藏层中心的优化及拓展常数参数的优化,进一步证明该方法具有更高的精确度。The power grid load forecasting system takes into account factors such as users, regions, time, environment, and climate, and realizes the prediction of power load at a certain time or period in the future by building a training model. Previously, some methods simulated the logistic growth curve model, multivariate linear regression analysis model, gray prediction model, and neural network model to establish a prediction model, and predicted the power supply load and power consumption in Harbin. A method with higher accuracy was determined through comparative analysis. Some methods are based on BP neural networks, considering the influence of multiple factors such as temperature, and obtained output results with higher accuracy. Some researchers proposed a Lasso-PCA data reduction and feature extraction model to reduce the amount of calculation and parameters of the model and improve the execution efficiency of the model. They also innovatively used an improved adaptive genetic algorithm to optimize the BP neural network training process. The analysis and demonstration showed that this method has higher prediction accuracy for power load. In view of the defects of single-scale research, a recurrent neural network prediction model based on the back propagation algorithm of time series decomposition can comprehensively establish a prediction model based on the influence of multiple factors to predict the power consumption of users in future periods. Some researchers have also proposed an electricity consumption forecasting model based on the water wave optimization algorithm to improve the radial basis function neural network, solving the problem of inaccurate electricity load forecasting based on user levels. By optimizing the hidden layer center of the RBF neural network and the optimization of the expansion constant parameters, it is further proved that this method has higher accuracy.

随着智能电网的发展以及电力行业的需求不断增大,用电负荷预测的重要性日益显现,并且对用电负荷预测精度的要求越来越高。总结目前的相似研究与技术发现:现有的大多数电网负荷预测方法要么偏向于传统的算法,如回归分析法、时间序列法等,这些方法都存在缺陷,无法考虑气象数据等复杂等因素的影响,预测结果和真实值差异很大;要么使用机器学习的方法仅面向单一数据来源,如当前用电负荷和历史用电负荷,虽然短期预测效果较传统方法有所提高,但往往用户用电负荷受环境因素影响较大,例如天气、温度、湿度、节假日等,使得先前的方法预测准确性欠佳,预测结果的偏差较大。因此,探索新的基于深度学习的多因素用电负荷预测已经成为目前的研究热点。With the development of smart grids and the increasing demand of the power industry, the importance of power load forecasting has become increasingly apparent, and the requirements for power load forecasting accuracy have become increasingly higher. Summarizing the current similar research and technical findings: Most of the existing power grid load forecasting methods either tend to be traditional algorithms, such as regression analysis and time series methods, which have defects and cannot consider the influence of complex factors such as meteorological data, and the prediction results are very different from the true value; or use machine learning methods that only face a single data source, such as current power load and historical power load. Although the short-term prediction effect is better than that of traditional methods, the user's power load is often greatly affected by environmental factors, such as weather, temperature, humidity, holidays, etc., which makes the previous method poor in prediction accuracy and large deviation in prediction results. Therefore, exploring new multi-factor power load forecasting based on deep learning has become a current research hotspot.

发明内容Summary of the invention

针对上述问题,本发明的目的在于提供基于深度学习的多因素用户用电负荷预测方法,以实现不同用户类别、地域、不同环境因素条件下用电负荷准确预测的目的。技术方案如下:In view of the above problems, the purpose of the present invention is to provide a multi-factor user power load prediction method based on deep learning, so as to achieve the purpose of accurately predicting power load under different user categories, regions, and different environmental factors. The technical solution is as follows:

一种基于深度学习的多因素用电负荷预测方法,包括以下步骤:A multi-factor power load forecasting method based on deep learning, comprising the following steps:

步骤1:获取包括不同区域、年份和用电类别的用电负荷数据,以及包括温度、湿度和风力的环境影响数据,并存储在sqlite数据库;Step 1: Obtain electricity load data including different regions, years and electricity consumption categories, as well as environmental impact data including temperature, humidity and wind power, and store them in a sqlite database;

步骤2:对用电负荷数据和环境影响数据进行清洗与预处理,包括:基于k-邻近算法和改进DBSCAN算法的异常数据检测、自回归插值和序列数据归一化;Step 2: Clean and preprocess the power load data and environmental impact data, including: abnormal data detection based on the k-nearest neighbor algorithm and the improved DBSCAN algorithm, autoregressive interpolation, and sequence data normalization;

步骤3:引入自注意力机制,对LSTM的隐藏层进行特征重构,使其实现端到端的部署,从而构建基于改进的CNN-LSMT的用电负荷数据预测神经网络模型,利用步骤2处理得到的用电负荷数据和环境影响数据作为训练集和测试集;Step 3: Introduce the self-attention mechanism and reconstruct the features of the hidden layer of LSTM to achieve end-to-end deployment, thereby building a power load data prediction neural network model based on the improved CNN-LSMT, and use the power load data and environmental impact data processed in step 2 as training sets and test sets;

步骤4:通过改进的CNN-LSMT的用电负荷数据预测神经网络模型进行用电负荷预测。Step 4: Power load forecasting is performed using the improved CNN-LSMT power load data prediction neural network model.

进一步的,所述电负荷数据包括用户编号、所属地域、城网/农网、用户分类、用电类别、客户类别、电压等级、行业类别、合同容量、运行容量、首次送电日期、冻结电量日期和15分钟的冻结电量;所述环境影响数据包括当前日期、所属地域、最高温度、最低温度、平均湿度、平均风力、天气类型、15分钟的当前温度、是否为节假日;且环境影响数据的时间跟用电负荷数据的时间同步。Furthermore, the electricity load data includes user number, region, urban network/rural network, user classification, electricity consumption category, customer category, voltage level, industry category, contract capacity, operating capacity, first power supply date, frozen electricity date and 15-minute frozen electricity; the environmental impact data includes current date, region, maximum temperature, minimum temperature, average humidity, average wind speed, weather type, current temperature for 15 minutes, and whether it is a holiday; and the time of the environmental impact data is synchronized with the time of the electricity load data.

更进一步的,所述步骤2中基于k-邻近算法和改进DBSCAN算法的异常数据检测具体为:Furthermore, the abnormal data detection based on the k-nearest neighbor algorithm and the improved DBSCAN algorithm in step 2 is specifically as follows:

步骤2.1:定义样本与邻近样本之间的平均距离为该样本的异常得分,使用改进KNN算法得到每个时刻用电负荷数据的异常得分,将每个集群到各自k个邻近的总距离作为最后异常得分;Step 2.1: Define the average distance between a sample and its neighboring samples as the abnormality score of the sample, use the improved KNN algorithm to obtain the abnormality score of the power load data at each moment, and take the total distance from each cluster to its k neighbors as the final abnormality score;

某时刻i的用电负荷数据ci的k个邻近的集合Nk(ci)表示为:The k neighboring sets N k ( ci ) of the power load data ci at a certain time i are expressed as:

Figure BDA0003316420310000021
Figure BDA0003316420310000021

Figure BDA0003316420310000031
Figure BDA0003316420310000031

式中,

Figure BDA0003316420310000032
表示ci的k个邻近点之一,dk(ci)为ci的k个邻近的平均距离,
Figure BDA0003316420310000033
表示ci与其邻近点
Figure BDA0003316420310000034
的距离;In the formula,
Figure BDA0003316420310000032
represents one of the k neighbors of ci , d k ( ci ) is the average distance of the k neighbors of ci ,
Figure BDA0003316420310000033
represents ci and its neighboring points
Figure BDA0003316420310000034
distance;

则电负荷数据ci的异常得分表示为:Then the abnormal score of the electric load data ci is expressed as:

Figure BDA0003316420310000035
Figure BDA0003316420310000035

式中,Nk(ci)表示ci的k个邻近点集合Where N k ( ci ) represents the set of k neighboring points of ci

最后输出异常得分排序列表的前m个集群作为用电负荷数据的异常值;Finally, the first m clusters in the anomaly score sorting list are output as the outliers of the power load data;

步骤2.2:采用改进的基于聚类的异常检测算法DBSCAN,先利用局部参数实现小样本的数据进行密度聚类,再对局部聚类结果进行迭代聚类从而实现最终的全局聚类结果,并标记不属于任何簇类的点属于异常点;具体包括:Step 2.2: Using the improved clustering-based anomaly detection algorithm DBSCAN, first use local parameters to achieve density clustering of small sample data, then iteratively cluster the local clustering results to achieve the final global clustering results, and mark the points that do not belong to any cluster as outliers; specifically, including:

步骤a)参数更新:Step a) Parameter update:

设置聚类滑动窗口的大小M,计算窗口内用电负荷数据的平均距离差值,设置前k个邻近的用电负荷为MinPst,用电负荷数据之间的欧氏距离设置为Eps;Set the size of the cluster sliding window M, calculate the average distance difference of the power load data in the window, set the first k neighboring power loads as MinPst, and set the Euclidean distance between the power load data as Eps;

为每个用电负荷数据设置权重以减少对最终聚类结果的影响,权重w(ci,cj)的计算公式如下:A weight is set for each power load data to reduce the impact on the final clustering result. The calculation formula of the weight w( ci , cj ) is as follows:

Figure BDA0003316420310000036
Figure BDA0003316420310000036

Figure BDA0003316420310000037
Figure BDA0003316420310000037

其中,Cov(ci,cj)为i时刻的用电负荷数据ci和j时刻的用电负荷数据cj的协方差,Var(ci)为i时刻的用电负荷数据ci的方差,Var(cj)为j时刻的用电负荷数据cj的方差;r(ci,cj)表示为ci和cj的相关系数,值越小相关性越大;Wherein, Cov( ci , cj ) is the covariance of the power load data ci at time i and the power load data cj at time j , Var( ci ) is the variance of the power load data ci at time i, Var( cj ) is the variance of the power load data cj at time j ; r( ci , cj ) is expressed as the correlation coefficient between ci and cj , the smaller the value, the greater the correlation;

步骤b)通过分析聚类结果得到用电负荷数据异常检测结果:Step b) Obtaining the power load data anomaly detection result by analyzing the clustering results:

在聚类的过程中,将第一次标记的异常点设置为候选异常点,设置异常分数加1,在循环迭代聚类过程中进入下一个候选异常点,更新异常分数;如果异常分数S等于聚类数C则标记为异常点。In the clustering process, the first marked outlier is set as a candidate outlier, and the anomaly score is set plus 1. In the cyclic iterative clustering process, the next candidate outlier is entered and the anomaly score is updated; if the anomaly score S is equal to the number of clusters C, it is marked as an outlier.

更进一步的,所述步骤2中自回归插值具有为:Furthermore, the autoregressive interpolation in step 2 has:

采用Lagrange插值法对缺失的用电负荷数据值进行补全,使得n-1的多项式y=a0+a1x+a2x2+…+an-1xn-1经过n个点的坐标(x1,y1),(x2,y2),(x3,y3),…,(xn,yn),那么拉格朗日插值的函数表达式表示为:The Lagrange interpolation method is used to complete the missing power load data values, so that the n-1 polynomial y= a0 + a1x + a2x2 +…+a n- 1xn -1 passes through the coordinates of n points ( x1 , y1 ), ( x2 , y2 ), ( x3 , y3 ),… , ( xn , yn ). Then the function expression of Lagrange interpolation is expressed as:

Figure BDA0003316420310000041
Figure BDA0003316420310000041

其中,xi和xj分别表示电力用户的第i个和第j个时刻,yi表示电力用户的第i个时刻的用电负荷;n表示用电负荷总的时刻数。Among them, xi and xj represent the i-th and j-th moments of the power user respectively, yi represents the power load of the power user at the i-th moment; and n represents the total number of moments of power load.

更进一步的,所述步骤2中序列数据归一化的处理公式为:Furthermore, the processing formula for normalizing the sequence data in step 2 is:

Figure BDA0003316420310000042
Figure BDA0003316420310000042

其中,X={X1,X2,X3…,XN}表示同一类别的电负荷数据或环境影响数据;其归一化后的值为Y={Y1,Y2,Y3…,YN},l∈[1,N],N为某类别种电负荷数据或环境影响数据的总数。Among them, X={X 1 ,X 2 ,X 3 …,X N } represents the same category of electric load data or environmental impact data; its normalized value is Y={Y 1 ,Y 2 ,Y 3 …,Y N }, l∈[1,N], and N is the total number of electric load data or environmental impact data of a certain category.

更进一步的,所述步骤3具体包括:Furthermore, the step 3 specifically includes:

步骤3.1:对于网络模型数据的加载和准备Step 3.1: Loading and preparing network model data

加载数据集并划分训练集、验证集和测试集,然后分别对训练集和测试集构造Dataloader作为数据读取器,模型训练的时候对每一个batch的数据进行计算,Dataloader按照batch的大小将数据加载到内存,并对每一个batch的数据打乱顺序,以提高模型训练的鲁棒性;Load the dataset and divide it into training set, validation set and test set, then construct Dataloader as data reader for the training set and test set respectively. When training the model, calculate the data of each batch. Dataloader loads the data into memory according to the batch size and shuffles the order of each batch of data to improve the robustness of model training.

步骤3.2:构建基于改进的CNN-LSMT的用电负荷数据预测神经网络模型Step 3.2: Construct a neural network model for power load data prediction based on improved CNN-LSMT

所述神经网络模型包括CNN特征学习模块、LSTM序列学习模块和自注意力机制模块三部分:The neural network model includes three parts: CNN feature learning module, LSTM sequence learning module and self-attention mechanism module:

1)所述CNN特征学习模块包括三个一维卷积层,其中在两个连续的卷积层之间加入MaxPooling层和ReLu层;通过卷积操作学习标准化后的用电负荷数据和环境影响数据的特征,作为卷积层输出的特征图;添加MaxPooling层以减轻生成的特征图不变性的限制,激活函数ReLu以增强模型学习复杂结构的能力;1) The CNN feature learning module includes three one-dimensional convolutional layers, wherein a MaxPooling layer and a ReLu layer are added between two consecutive convolutional layers; the features of the standardized power load data and the environmental impact data are learned through the convolution operation as the feature map output by the convolutional layer; the MaxPooling layer is added to alleviate the limitation of the invariance of the generated feature map, and the activation function ReLu is used to enhance the model's ability to learn complex structures;

2)所述LSTM序列学习模块包括三个LSTM层,每个层包含二十个神经元;前两个LSTM层输出隐藏状态的完整序列,而在最后的LSTM层,输出隐藏状态最后的时间步;2) The LSTM sequence learning module includes three LSTM layers, each layer contains twenty neurons; the first two LSTM layers output the complete sequence of hidden states, and the last LSTM layer outputs the last time step of the hidden state;

LSTM层的输入为t时刻的影响参数xt和上一时刻的预测值ht-1,经过预测函数F得到t时刻的预测值ht;其函数表达式如下:The input of the LSTM layer is the influencing parameter xt at time t and the predicted value ht-1 at the previous time. The predicted value ht at time t is obtained through the prediction function F. Its function expression is as follows:

ht=F(xt,ht-1)h t =F(x t ,h t-1 )

3)所述三个LSTM层之间分别加入一个自注意力机制模块,自注意力机制模块对LSTM层提取的隐藏层的特征分配权重,从而挖掘用电负荷数据和环境影响数据更具判别性的特征;3) A self-attention mechanism module is added between each of the three LSTM layers, and the self-attention mechanism module assigns weights to the features of the hidden layer extracted by the LSTM layer, thereby mining more discriminative features of the power load data and the environmental impact data;

t时刻LSTM网络倒数第二层的隐藏层输出序列分配自注意力权重wtl,其表达式为:The output sequence of the hidden layer of the penultimate layer of the LSTM network at time t is assigned the self-attention weight w tl , which is expressed as:

Figure BDA0003316420310000051
Figure BDA0003316420310000051

其中,Lh为LSTM隐藏层输出的序列长度,l表示LSTM隐藏层输出序列的序列号,stl表示LSTN隐藏层在t时刻的序列l与其他序列直接的相似度;Among them, L h is the length of the sequence output by the LSTM hidden layer, l represents the sequence number of the LSTM hidden layer output sequence, and s tl represents the direct similarity between the sequence l of the LSTM hidden layer at time t and other sequences;

将序列l的特征htl与其对应的权重wtl相乘构成新的特征序列ht'l并输入到下一个LSTM中;Multiply the feature h tl of sequence l by its corresponding weight w tl to form a new feature sequence h t ' l and input it into the next LSTM;

步骤3.3:定义优化器,设置模型训练参数Step 3.3: Define the optimizer and set model training parameters

使用平均绝对误差作为损失函数来监控验证损失,设置自适应优化器Adam,在训练的过程中能自适应的更新学习率;并设置训练迭代次数epoch,初始损失函数,及batch的大小。Use mean absolute error as the loss function to monitor the validation loss, set the adaptive optimizer Adam, and adaptively update the learning rate during the training process; and set the number of training iterations epoch, the initial loss function, and the batch size.

更进一步的,所述步骤4中对用电负荷的预测包括:超短期用电负荷预测、短期用电负荷预测和中长期用电负荷预测。Furthermore, the prediction of power load in step 4 includes: ultra-short-term power load prediction, short-term power load prediction and medium- and long-term power load prediction.

本发明的有益效果是:The beneficial effects of the present invention are:

1)本发明提出的基于深度学习的多因素用电负荷预测方法,弥补了现状中存在仅面向单一数据来源,无法考虑气象数据等复杂等因素的影响,预测结果和真实值差异很大的不足之处。通过引入外部环境因素如温度、湿度、风力等最为用电负荷的影响因子并结合历史用电负荷数据作为输入特征,能够实现当前用电负荷的预测,并且考虑环境因素后能够有效提高预测精度。1) The multi-factor power load forecasting method based on deep learning proposed in this invention makes up for the shortcomings of the current situation that it only focuses on a single data source, cannot consider the influence of complex factors such as meteorological data, and the prediction results are very different from the true value. By introducing external environmental factors such as temperature, humidity, wind power, etc. as the most influential factors of power load and combining historical power load data as input features, the current power load can be predicted, and the prediction accuracy can be effectively improved after considering environmental factors.

2)本发明提出的基于深度学习的多因素用电负荷预测方法,提出一种KNN和改进DBSCAN结合的用电负荷异常数据检测的方法,可以有效挖掘用电负荷数据值的离散值,通过还原缺失或异常的用电负荷数据,可以减少异常值对预测模型的影响。2) The present invention proposes a multi-factor electricity load forecasting method based on deep learning, and proposes a method for detecting abnormal electricity load data by combining KNN and improved DBSCAN, which can effectively mine the discrete values of electricity load data values, and reduce the impact of abnormal values on the prediction model by restoring missing or abnormal electricity load data.

3)本发明提出的基于深度学习的多因素用电负荷预测方法,提出一种改进的CNN-LSTM用电负荷预测模型,首先采用CNN特性提取模块对输入数据的局部特征进行学习;然后输入到LSTM序列学习模型中,提取输入数据的序列特征信息;同时将self-attention机制引入到LSTM中用于学习LSTM隐藏层的特征,通过分配不同的注意力权重实现关键特制的提取,以提升最终的预测精度;最后通过Dropout层和FC层对用电负荷进行预测。此外,基于改进的CNN-LSTM预测模型不仅能获得较高的预测准确度,而且能简化神经网络构建流程,并且可以实现端到端的部署,从而促进研究结果快速投入使用。3) The multi-factor power load forecasting method based on deep learning proposed in the present invention proposes an improved CNN-LSTM power load forecasting model. First, the CNN feature extraction module is used to learn the local features of the input data; then it is input into the LSTM sequence learning model to extract the sequence feature information of the input data; at the same time, the self-attention mechanism is introduced into the LSTM to learn the features of the LSTM hidden layer, and the key special extraction is achieved by assigning different attention weights to improve the final prediction accuracy; finally, the power load is predicted through the Dropout layer and the FC layer. In addition, the improved CNN-LSTM prediction model can not only obtain a higher prediction accuracy, but also simplify the neural network construction process, and can achieve end-to-end deployment, thereby promoting the rapid use of research results.

4)本发明提出的基于深度学习的多因素用电负荷预测方法,实现不同类别用户超短期、短期和中长期用电负荷的预测,并且对预测结果进行可视化,并辅助对疫情前后用电分析、电力用电分布分析和电力用户画像等。本发明能够促进电网数字化升级,满足用户的个性化需求,实现行业关联分析、调度电力发电、预测用电趋势和指导复工复产等。本发明既能够作为独立的系统,也可作为组件嵌入到电力部门原有的系统中能够很大程度上节约资源,提高工作效率。4) The multi-factor electricity load forecasting method based on deep learning proposed in the present invention realizes the prediction of ultra-short-term, short-term and medium- and long-term electricity loads of different categories of users, and visualizes the prediction results, and assists in the analysis of electricity consumption before and after the epidemic, electricity consumption distribution analysis and electricity user portraits. The present invention can promote the digital upgrade of the power grid, meet the personalized needs of users, realize industry correlation analysis, dispatch power generation, predict electricity consumption trends and guide resumption of work and production. The present invention can be used as an independent system or as a component embedded in the original system of the power department, which can save resources to a great extent and improve work efficiency.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明整体流程示意图。FIG. 1 is a schematic diagram of the overall process of the present invention.

图2是本发明的数据处理流程示意图。FIG. 2 is a schematic diagram of a data processing flow of the present invention.

图3是本发明提出的基于改进的CNN-LSTM用电负荷预测模型结构示意图。FIG3 is a schematic diagram of the structure of the electricity load prediction model based on the improved CNN-LSTM proposed in the present invention.

图4是本发明对某地域商业用电的超短期用电负荷预测值与真实值的对比折线图;(a)为2019年该地五一商业用电负荷预测值与真实值的对比;(b)为2018/2019/2020年该地五一商业用电负荷对比。Figure 4 is a line graph comparing the ultra-short-term commercial electricity load forecast value and the actual value for a certain region according to the present invention; (a) is a comparison between the forecast value and the actual value of the May 1 commercial electricity load in the region in 2019; (b) is a comparison of the May 1 commercial electricity load in the region in 2018/2019/2020.

图5本发明对某地域商业用电的短期用电负荷预测值与真实值的对比示折线图。FIG5 is a line graph showing a comparison between the predicted value of short-term commercial electricity load in a certain region and the actual value according to the present invention.

图6本发明对某地域商业用电的中长期用电负荷预测值与真实值的对比折线图;(a)为该地每周用电负荷预测;(b)为该地每月用电负荷预测。FIG6 is a line graph comparing the medium- and long-term electricity load forecast value and the actual value of commercial electricity consumption in a certain area according to the present invention; (a) is the weekly electricity load forecast for the area; (b) is the monthly electricity load forecast for the area.

图7是本发明的应用方案示意图。FIG. 7 is a schematic diagram of an application scheme of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图和具体实施例对本发明做进一步详细说明。本发明通过引入外部环境因素如温度、湿度、风力等最为用电负荷的影响因子,使用基于k-邻近(KNN)算法和改进DBSCAN(Density-Based Spatial Clustering of Applications with Noise基于密度的空间聚类在噪声中的应用)算法的异常数据检测、自回归插值和序列数据归一化对数据进行预处理和标准化;然后提出一种改进的CNN-LSTM用电负荷预测模型,首先采用CNN特性提取模块对输入数据的局部特征进行学习;然后输入到LSTM序列学习模型中,提取输入数据的序列特征信息;同时将self-attention机制引入到LSTM中用于学习LSTM隐藏层的特征,通过分配不同的注意力权重实现关键特制的提取,以提升最终的预测精度;最后进行超短期用电负荷预测、短期用电负荷预测和中长期用电负荷预测并辅助对疫情前后用电分析、电力用电分布分析和电力用户画像等。本发明能够促进电网数字化升级,满足用户的个性化需求,实现行业关联分析、调度电力发电、预测用电趋势和指导复工复产等。The present invention is further described in detail below in conjunction with the accompanying drawings and specific embodiments. The present invention introduces external environmental factors such as temperature, humidity, wind power, etc. as the most influential factors of power load, and uses abnormal data detection, autoregressive interpolation and sequence data normalization based on k-nearest neighbor (KNN) algorithm and improved DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to preprocess and standardize the data; then an improved CNN-LSTM power load prediction model is proposed, first using the CNN feature extraction module to learn the local features of the input data; then input into the LSTM sequence learning model to extract the sequence feature information of the input data; at the same time, the self-attention mechanism is introduced into the LSTM to learn the features of the LSTM hidden layer, and the key special extraction is realized by assigning different attention weights to improve the final prediction accuracy; finally, ultra-short-term power load prediction, short-term power load prediction and medium- and long-term power load prediction are carried out, and the power consumption analysis before and after the epidemic, power consumption distribution analysis and power user portrait are assisted. The present invention can promote the digital upgrade of the power grid, meet the personalized needs of users, realize industry correlation analysis, dispatch power generation, predict power consumption trends, and guide resumption of work and production.

如图1所示的本发明基于深度学习的多因素用电负荷预测方法的具体实施流程图,包括:用电负荷数据和外部环境数据的获取与存储、基于KNN和改进DBSCAN算法的数据清洗与预处理、基于改进CNN-LSTM模型的搭建与训练和用电负荷的预测。具体实施步骤如下:The specific implementation flow chart of the multi-factor power load forecasting method based on deep learning of the present invention as shown in Figure 1 includes: acquisition and storage of power load data and external environment data, data cleaning and preprocessing based on KNN and improved DBSCAN algorithm, construction and training based on improved CNN-LSTM model and power load forecasting. The specific implementation steps are as follows:

步骤1:获取不同区域、年份、用电类别负荷数据和外部环境因素如温度、湿度、风力等数据,并存储在sqlite数据库。具体步骤如下:Step 1: Obtain load data of different regions, years, and electricity consumption categories and external environmental factors such as temperature, humidity, wind speed, etc., and store them in the sqlite database. The specific steps are as follows:

1)不同区域、年份、用电类别负荷数据是从国家电网用户用电信息发布平台等渠道获取得到,数据类别包括用户编号(已脱敏)、所属地域、城网/农网、用户分类(如高压、充电桩、并网高压自备电厂、低压非居民、低压居民等)、用电类别(家庭用电、商业用电、城市用电、农村用电等)、客户类别(正常用电客户、变更用电客户)、电压等级(超高压、高压、低压)、行业类别、合同容量、运行容量、首次送电日期、冻结电量日期和15分钟的冻结电量等,表1显示了对用电负荷数据字段的描述。对于用电负荷数据选择存储在sqlite数据库,方便数据的迁移和系统的部署。1) Load data of different regions, years, and electricity consumption categories are obtained from channels such as the State Grid user electricity consumption information release platform. The data categories include user number (desensitized), region, urban network/rural network, user classification (such as high voltage, charging pile, grid-connected high voltage self-provided power plant, low voltage non-residential, low voltage resident, etc.), electricity consumption category (household electricity, commercial electricity, urban electricity, rural electricity, etc.), customer category (normal electricity customers, changed electricity customers), voltage level (ultra-high voltage, high voltage, low voltage), industry category, contract capacity, operating capacity, first power supply date, frozen electricity date and 15-minute frozen electricity, etc. Table 1 shows the description of the electricity load data fields. The electricity load data is stored in the sqlite database to facilitate data migration and system deployment.

表1用电负荷数据字段描述Table 1 Description of power load data fields

Figure BDA0003316420310000071
Figure BDA0003316420310000071

2)通过编写Python爬虫程序完成外部环境因素如温度、湿度、风力等数据的获取与存储。Python爬虫过程涉及模拟登陆、页面获取与解析、数据结构设计以及将外部环境数据存储入库等。外部环境因素如温度、湿度、风力等数据从国家气象网站的官方接口获取,数据类别包括当前日期、所属地域、最高温度、最低温度、平均湿度、平均风力、天气类型(设置阈值范围为[1,4],表示阴到晴的程度)、15分钟的当前温度、是否为节假日(取值为0或1),表2显示了对外部环境数据字段的描述。其中环境数据的时间跟用电负荷数据的时间同步,同样地,对于环境数据选择存储在sqlite数据库,方便数据的迁移和系统的部署。2) The acquisition and storage of data on external environmental factors such as temperature, humidity, and wind speed are completed by writing a Python crawler program. The Python crawler process involves simulated login, page acquisition and parsing, data structure design, and storage of external environmental data. Data on external environmental factors such as temperature, humidity, and wind speed are obtained from the official interface of the National Meteorological Website. The data categories include the current date, region, maximum temperature, minimum temperature, average humidity, average wind speed, weather type (the threshold range is set to [1,4], indicating the degree of cloudy to sunny), the current temperature of 15 minutes, and whether it is a holiday (the value is 0 or 1). Table 2 shows the description of the external environmental data fields. The time of the environmental data is synchronized with the time of the power load data. Similarly, the environmental data is stored in the sqlite database to facilitate data migration and system deployment.

表2外部环境数据字段描述Table 2 External environment data field description

字段名称Field Name 字段含义Field meaning WEATHER_DATEWEATHER_DATE 当前日期Current Date WEATHER_ADDRWEATHER_ADDR 所属地域Region MAX_TEMPMAX_TEMP 最高温度Maximum temperature LOW_TEMPLOW_TEMP 最低温度Minimum temperature AVERAGE_HUMIDITYAVERAGE_HUMIDITY 平均湿度Average humidity AVERAGE_WINDAVERAGE_WIND 平均风力Average wind speed WEATHER_TYPEWEATHER_TYPE 天气类型(阈值[1,4],表示阴到晴的程度)Weather type (threshold [1,4], indicating the degree of cloudy to sunny) R1_WEATHERR1_WEATHER 15分钟的当前温度Current temperature over 15 minutes IS_HOLIDAYIS_HOLIDAY 是否为节假日(取值为0或1)Is it a holiday (value is 0 or 1)

步骤2:用电负荷数据和环境影响数据的清洗与预处理,包括:基于k-邻近(KNN)算法和改进DBSCAN算法的异常数据检测、自回归插值和序列数据归一化,参考图2为本发明的数据处理流程示意图。Step 2: Cleaning and preprocessing of power load data and environmental impact data, including: abnormal data detection based on k-nearest neighbor (KNN) algorithm and improved DBSCAN algorithm, autoregressive interpolation and sequence data normalization. Refer to Figure 2 for a schematic diagram of the data processing flow of the present invention.

具体步骤如下:The specific steps are as follows:

1)基于k-邻近(KNN)算法和改进DBSCAN算法的异常用电负荷数据检测。由于电力设备故障或人为等因素,使得获取的用户用电负荷数据存在少量缺失或者异常的情况,这些缺失或者异常数据对用户用电负荷的预测准确度影响较大,所以需要检测出这些缺失或者异常数据,采用剔除或者插值等方法来还原正常的数据值。1) Detection of abnormal power load data based on k-nearest neighbor (KNN) algorithm and improved DBSCAN algorithm. Due to power equipment failure or human factors, the acquired user power load data may contain a small amount of missing or abnormal data. These missing or abnormal data have a great impact on the prediction accuracy of user power load. Therefore, it is necessary to detect these missing or abnormal data and use methods such as elimination or interpolation to restore normal data values.

本发明提出基于改进的KNN算法的离群点检测方法。KNN算法的基本规则是寻找样本c在特征空间中的k个邻近样本或最相似的样本(1<k<N,N表示总的样本个数),如果这k个邻近样本的大多数属于一个类别,那么c也属于这个类别。通过计算样本c与邻近样本之间的平均距离作为样本c的异常得分,异常得分越高表示样本c越异常。为了更好地反映用电负荷数据的异常情况,本发明在使用改进KNN算法得到每个时刻用电负荷数据的异常得分后,使用每个集群到各自k个邻近的总距离作为最后异常得分。则某时刻i的用电负荷数据ci的k个邻近的集合Nk(ci)表示为:The present invention proposes an outlier detection method based on an improved KNN algorithm. The basic rule of the KNN algorithm is to find k neighboring samples or the most similar samples (1<k<N, N represents the total number of samples) of sample c in the feature space. If most of these k neighboring samples belong to a category, then c also belongs to this category. The average distance between sample c and its neighboring samples is calculated as the abnormality score of sample c. The higher the abnormality score, the more abnormal the sample c is. In order to better reflect the abnormal situation of the power load data, the present invention uses the total distance from each cluster to its respective k neighbors as the final abnormality score after obtaining the abnormality score of the power load data at each moment using the improved KNN algorithm. Then the set N k (c i ) of k neighbors of the power load data c i at a certain moment i is expressed as:

Figure BDA0003316420310000091
Figure BDA0003316420310000091

Figure BDA0003316420310000092
Figure BDA0003316420310000092

式中,

Figure BDA0003316420310000093
表示ci的k个邻近点之一,dk(ci)为ci的k个邻近的平均距离,
Figure BDA0003316420310000094
表示ci与其邻近点
Figure BDA0003316420310000095
的距离;则电负荷数据ci的异常得分表示为:In the formula,
Figure BDA0003316420310000093
represents one of the k neighbors of ci , d k ( ci ) is the average distance of the k neighbors of ci ,
Figure BDA0003316420310000094
represents ci and its neighboring points
Figure BDA0003316420310000095
The distance; then the abnormal score of the electric load data c i is expressed as:

Figure BDA0003316420310000096
Figure BDA0003316420310000096

式中,Nk(ci)表示ci的k个邻近点集合;Where N k ( ci ) represents the set of k neighboring points of ci ;

最后,输出异常得分排序列表的前m个集群作为用电负荷数据的异常值。Finally, the top m clusters in the anomaly score sorted list are output as outliers of the power load data.

由于KNN算法在计算集群ci的k个邻近的时存在异常集群和正常集群的相互干扰,是的异常检测结果后容易出现错检和漏检的情况。为了解决这个问题,本发明提出改进的基于聚类的异常检测算法DBSCAN。DBSCAN算法是一种典型的基于密度的聚类算法,该算法通过将紧密相连的样本划为各个不同的类别,最终得出聚类类别结果,其中认定不属于任何簇类的点属于异常点。相比传统的DBSCAN算法采用全局统一的参数Eps和MinPst实现聚类,本发明提出一种基于滑动窗口的数据划分改进方法,先利用局部参数实现小样本的数据进行密度聚类,再对局部聚类结果进行迭代聚类从而实现最终的全局聚类结果。改进的DBSCAN算法的步骤包括参数更新、聚类和异常检测三个步骤。Since the KNN algorithm interferes with abnormal clusters and normal clusters when calculating the k neighbors of cluster c i , it is easy to have misdetection and missed detection after the abnormal detection result. In order to solve this problem, the present invention proposes an improved clustering-based anomaly detection algorithm DBSCAN. The DBSCAN algorithm is a typical density-based clustering algorithm, which divides closely connected samples into different categories and finally obtains clustering category results, in which points that do not belong to any cluster are considered to be abnormal points. Compared with the traditional DBSCAN algorithm that uses globally unified parameters Eps and MinPst to achieve clustering, the present invention proposes an improved data partitioning method based on a sliding window, which first uses local parameters to achieve density clustering of small sample data, and then iteratively clusters the local clustering results to achieve the final global clustering results. The steps of the improved DBSCAN algorithm include three steps: parameter update, clustering and anomaly detection.

在参数更新的过程中,首先需要设置聚类滑动窗口的大小M并计算窗口内用电负荷数据的平均距离差值,设置前k个邻近的用电负荷为MinPst,用电负荷数据之间的欧氏距离设置为Eps。为了缓解用电负荷数据之间的不一致性,为每个用电负荷数据设置权重以减少对最终聚类结果的影响。权重w(ci,cj)的计算公式如下:In the process of parameter update, we first need to set the size M of the cluster sliding window and calculate the average distance difference of the power load data in the window, set the first k neighboring power loads as MinPst, and set the Euclidean distance between the power load data as Eps. In order to alleviate the inconsistency between the power load data, a weight is set for each power load data to reduce the impact on the final clustering result. The calculation formula of the weight w( ci , cj ) is as follows:

Figure BDA0003316420310000097
Figure BDA0003316420310000097

Figure BDA0003316420310000098
Figure BDA0003316420310000098

其中,Cov(ci,cj)为i时刻的用电负荷数据ci和j时刻的用电负荷数据cj的协方差,Var(ci)为i时刻的用电负荷数据ci的方差,Var(cj)为j时刻的用电负荷数据cj的方差;r(ci,cj)表示为ci和cj的相关系数,值越小相关性越大;通过分析聚类结果得到用电负荷数据异常检测结果。在聚类的过程中,将第一次标记为异常点设置为候选异常点,设置异常分数加1(初始值为0),在循环迭代聚类过程中进入下一个候选异常点,更新异常分数。如果异常分数S等于聚类数C则标记为异常点。Among them, Cov( ci , cj ) is the covariance of the power load data ci at time i and the power load data cj at time j, Var( ci ) is the variance of the power load data ci at time i, and Var( cj ) is the variance of the power load data cj at time j; r( ci , cj ) is expressed as the correlation coefficient of ci and cj , and the smaller the value, the greater the correlation; the power load data anomaly detection result is obtained by analyzing the clustering results. In the clustering process, the first marked abnormal point is set as a candidate abnormal point, and the abnormal score is set plus 1 (the initial value is 0). In the cyclic iterative clustering process, the next candidate abnormal point is entered and the abnormal score is updated. If the abnormal score S is equal to the number of clusters C, it is marked as an abnormal point.

2)对于用电负荷缺失值的处理,通常采用插值法补全缺失值,本发明采用Lagrange插值法对缺失的用电负荷数据值。插值的思想就是根据已知的点建立合适的插值函数f(x),因此未知点xi由插值函数f(x)得到函数值为f(xi),从而可以使用(xi,f(xi))近似代替缺失点。2) For the processing of missing values of power load, interpolation is usually used to fill in the missing values. The present invention uses Lagrange interpolation for the missing power load data values. The idea of interpolation is to establish a suitable interpolation function f(x) based on known points. Therefore, the unknown point x i is obtained by the interpolation function f(x) as f(xi), so that (x i ,f(x i )) can be used to approximate the missing point.

Lagrange插值的思想就是使得n-1的多项式y=a0+a1x+a2x2+L+an-1xn-1经过n个点的坐标(x1,y1),(x2,y2),(x3,y3),L,(xn,yn),那么拉格朗日插值的函数表达式可以表示为:The idea of Lagrange interpolation is to make the n-1 polynomial y=a0+a1x+a2x2 + L + a n- 1xn-1 pass through the coordinates of n points ( x1 , y1 ), ( x2 , y2 ), ( x3 , y3 ), L, ( xn , yn ), then the function expression of Lagrange interpolation can be expressed as:

Figure BDA0003316420310000101
Figure BDA0003316420310000101

其中,xi表示电力用户的第i个时刻,yi表示电力用户的第i个时刻的用电负荷;n表示用电负荷总的时刻数。Among them, xi represents the i-th moment of the power user, yi represents the power load of the power user at the i-th moment; n represents the total number of moments of power load.

3)对用户用电数据和环境数据进行归一化处理。由于不同类别或相同类别的数据差异很大,如用电负荷数据和温度在数值表示上差异很大,不同类别的用户类型的用电负荷数据数值上差异也很大,如果不进归一化处理,对模型的训练产生较大的影响,会使得某一参数的影响被隐藏或者放大。因此,对各指标数据的归一化处理公式为:3) Normalize the user power consumption data and environmental data. Since the data of different categories or the same category vary greatly, such as the power load data and temperature have great differences in numerical representation, and the power load data of different types of users also have great differences in numerical values, if normalization is not performed, it will have a great impact on the training of the model, and the influence of a certain parameter will be hidden or amplified. Therefore, the normalization formula for each indicator data is:

Figure BDA0003316420310000102
Figure BDA0003316420310000102

其中,X={X1,X2,X3L,XN}表示同一类别的数据如用电负荷数据、温度数据或风力大小等,因此数据归一化后的值为Y={Y1,Y2,Y3L,YN},N为某类别种电负荷数据或环境影响数据的总数。Among them, X={ X1 , X2 , X3L , XN } represents data of the same category, such as power load data, temperature data or wind speed, so the normalized value of the data is Y={ Y1 , Y2 , Y3L , YN }, and N is the total number of power load data or environmental impact data of a certain category.

步骤3:构建基于改进的CNN-LSMT的用电负荷数据预测网络模型,图3本发明提出的基于改进CNN-LSTM的用电负荷预测模型示意图。CNN和LSTM是使用最为广泛的深度学习技术。CNN模型可用于提取有价值的特征并可以滤除输入数据的噪声,LSTM网络能够捕获序列模式信息。LSTM网络虽然能够处理时间相关性序列信息,但它仅利用训练集中提供的属性,而相比之下,CNN能用于提取局部特征以及出现在不同区域的相同特征,但它不具备处理时序信息的特征。因此,利用两种深度学习技术优势的混合模型可以提高预测准确性。此外本发明引入自注意力注意力(self-attention)机制对CNN-LSTM进行改进,能够对LSTM的隐藏层进行特征重构。Step 3: Construct a power load data prediction network model based on the improved CNN-LSTM. FIG3 is a schematic diagram of the power load prediction model based on the improved CNN-LSTM proposed by the present invention. CNN and LSTM are the most widely used deep learning technologies. The CNN model can be used to extract valuable features and filter out the noise of the input data, and the LSTM network can capture sequence pattern information. Although the LSTM network can process time-related sequence information, it only uses the attributes provided in the training set. In contrast, CNN can be used to extract local features and the same features that appear in different regions, but it does not have the characteristics of processing time series information. Therefore, a hybrid model that utilizes the advantages of the two deep learning technologies can improve the prediction accuracy. In addition, the present invention introduces a self-attention mechanism to improve CNN-LSTM, which can reconstruct the features of the hidden layer of LSTM.

因此具体步骤如下:Therefore, the specific steps are as follows:

步骤3.1)用电负荷预测模型训练数据的准备,利用步骤2)处理得到的用电负荷数据和环境数据作为训练数据,例如以短期(每天)用电负荷预测为例,训练数据为一年365天某用户的每天的用电负荷和环境因素如最高温度、最低温度、平均温度、平均湿度等预处理后的数据,所述的用电负荷预测模型为神经网络。首先加载数据集并按照7:2:1划分训练集、验证集和测试集,然后分别对训练集和测试集构造Dataloader用于数据读取,模型训练的时候只是对每一个batch的数据进行计算,batch的大小将数据加载到内存,并对每一个batch的数据的打乱顺序,提高模型训练的鲁棒性,本发明设置batch的大小为10。Step 3.1) Preparation of training data for the power load prediction model, using the power load data and environmental data obtained in step 2) as training data, for example, taking short-term (daily) power load prediction as an example, the training data is the pre-processed data of a user's daily power load and environmental factors such as maximum temperature, minimum temperature, average temperature, average humidity, etc. for 365 days a year, and the power load prediction model is a neural network. First, load the data set and divide it into training set, verification set and test set according to 7:2:1, and then construct Dataloader for data reading for the training set and test set respectively. When training the model, only the data of each batch is calculated. The batch size loads the data into the memory and shuffles the order of each batch of data to improve the robustness of the model training. The batch size is set to 10 in the present invention.

步骤3.2)用电负荷预测模型的搭建,本发明中的深度学习模型采用Python语言实验,改进的CNN-LSTM的用电负荷预测模型基于Pytorch深度学习库实现。改进的CNN-LSTM的用电负荷预测模型包括CNN特征学习模块、LSTM序列学习模块和自注意力注意力(self-attention)机制模块三部分。Step 3.2) Construction of the power load forecasting model. The deep learning model in the present invention uses Python language experiment, and the improved CNN-LSTM power load forecasting model is implemented based on the Pytorch deep learning library. The improved CNN-LSTM power load forecasting model includes three parts: CNN feature learning module, LSTM sequence learning module and self-attention mechanism module.

CNN特征学习模块由三个一维卷积层组成,其中在两个连续的卷积层之间加入MaxPooling层和ReLu层。引入卷积操作学习标准化后的用电负荷数据和环境数据的特征,作为卷积层输出的特征图有一个限制,即它会跟踪输入数据特征的精确位置,能够得到输入数据中的更具判别性的特征。通常在卷积层之后添加一个MaxPooling层,以减轻生成的特征图不变性的限制,而激活函数ReLu则用于增强模型学习复杂结构的能力,从而降低整体计算负载,同时使网络更易于训练。The CNN feature learning module consists of three one-dimensional convolutional layers, in which the MaxPooling layer and the ReLu layer are added between two consecutive convolutional layers. The convolution operation is introduced to learn the features of the standardized power load data and environmental data. The feature map output by the convolutional layer has a limitation, that is, it will track the exact location of the input data features and can obtain more discriminative features in the input data. A MaxPooling layer is usually added after the convolutional layer to alleviate the limitation of the invariance of the generated feature map, and the activation function ReLu is used to enhance the model's ability to learn complex structures, thereby reducing the overall computational load and making the network easier to train.

在对电力负荷进行预测时,应充分考虑负荷数据的时序相关性,相比传统的循环神经网络,LSTM能够准确学习时间序列中的长期依赖关系,适合对长周期的电力负荷数据进行学习,因此本发明使用LSTM进行用电负荷预测可以获得更高的预测精度。输入为t时刻的影响参数xt和上一时刻的预测值ht-1,经过预测函数F得到t时刻的预测值ht。其函数表达是如下:When predicting the power load, the time series correlation of the load data should be fully considered. Compared with the traditional recurrent neural network, LSTM can accurately learn the long-term dependency in the time series and is suitable for learning long-period power load data. Therefore, the present invention uses LSTM to predict the power load and can obtain higher prediction accuracy. The input is the influencing parameter xt at time t and the predicted value ht-1 at the previous time. The predicted value ht at time t is obtained through the prediction function F. Its function expression is as follows:

ht=F(xt,ht-1)h t =F(x t ,h t-1 )

LSTM通过给循环神经网络增加记忆及控制门的方式,增强了其处理长序列依赖问题的能力,在对用电负荷进行预测时表现出较好的性能。LSTM enhances the ability of recurrent neural networks to handle long sequence dependency problems by adding memory and control gates, and shows better performance in predicting power load.

在LSTM序列学习模块中,本发明使用了三个LSTM层,每个层包含二十个神经元。前两个LSTM层输出隐藏状态的完整序列,而在最后的LSTM层,输出隐藏状态最后的时间步。In the LSTM sequence learning module, the present invention uses three LSTM layers, each layer contains twenty neurons. The first two LSTM layers output the complete sequence of hidden states, and in the last LSTM layer, the last time step of the hidden state is output.

为了充分挖掘LSTM内部隐藏层的特征对模型最终的预测能力,本发明在序列学习模块的三个LSTM之间分别加入一个self-attention模块,self-attention机制对LSTM提取的隐藏层的特征分配权重,从而挖掘用电负荷数据和环境数据更具判别性的特征,以提升最终的模型预测准确度。t时刻LSTM网络倒数第二层的隐藏层输出序列分配自注意力权重wtl,其表达式为:In order to fully explore the features of the hidden layer inside the LSTM for the final prediction ability of the model, the present invention adds a self-attention module between the three LSTMs in the sequence learning module. The self-attention mechanism assigns weights to the features of the hidden layer extracted by the LSTM, thereby mining more discriminative features of the power load data and environmental data to improve the final model prediction accuracy. The output sequence of the hidden layer of the penultimate layer of the LSTM network at time t is assigned the self-attention weight w tl , which is expressed as:

Figure BDA0003316420310000121
Figure BDA0003316420310000121

其中,Lh为LSTM隐藏层输出的序列长度,l表示LSTM隐藏层输出序列的序列号,stl表示LSTN隐藏层在t时刻的序列l与其他序列直接的相似度;Among them, L h is the length of the sequence output by the LSTM hidden layer, l represents the sequence number of the LSTM hidden layer output sequence, and s tl represents the direct similarity between the sequence l of the LSTM hidden layer at time t and other sequences;

将序列l的特征htl与其对应的权重wtl相乘构成新的特征序列ht'l并输入到下一个LSTM中。引入self-attention机制对LSTM的隐藏层的特征赋予权重可以有效获取序列中的关键信息,从而提高预测的准确度和效率。The feature h tl of sequence l is multiplied by its corresponding weight w tl to form a new feature sequence h t ' l and input it into the next LSTM. Introducing the self-attention mechanism to assign weights to the features of the hidden layer of LSTM can effectively obtain the key information in the sequence, thereby improving the accuracy and efficiency of prediction.

在任何深度学习模型的开发中,dropout层包括随机选择神经元并在训练过程中停用其中一些神经元,以防止模型的过拟合。本发明中,在CNN特征提取块和LSTM序列学习之间加入了一个dropout层,以防止过度拟合。LSTM序列学习块的输出也连接到一个dropout层,然后是一个全连接层(FC)以产生最终输出。In the development of any deep learning model, the dropout layer involves randomly selecting neurons and deactivating some of them during the training process to prevent overfitting of the model. In the present invention, a dropout layer is added between the CNN feature extraction block and the LSTM sequence learning to prevent overfitting. The output of the LSTM sequence learning block is also connected to a dropout layer, followed by a fully connected layer (FC) to produce the final output.

步骤3.3)对于优化器的定义及模型训练参数设置。模型训练输入数据中训练数据、验证数据和测试数据的比例为7:2:1,使用平均绝对误差(MAE)作为损失函数来监控验证损失。模型的优化器设置为自适应优化器Adam,初始学习率设置为0.001,Adam在训练的过程中能自适应的更新学习率,使得模型收敛速度快,参数的调整更加地容易。此外,本模型的训练迭代次数epoch设置为700次,每次迭代的初始损失函数设置为0,batch的大小设置为10。Step 3.3) Define the optimizer and set the model training parameters. The ratio of training data, validation data, and test data in the model training input data is 7:2:1, and the mean absolute error (MAE) is used as the loss function to monitor the validation loss. The optimizer of the model is set to the adaptive optimizer Adam, and the initial learning rate is set to 0.001. Adam can adaptively update the learning rate during the training process, making the model converge faster and the adjustment of parameters easier. In addition, the number of training iterations of this model is set to 700, the initial loss function of each iteration is set to 0, and the batch size is set to 10.

步骤4:对用电负荷的预测主要包括:超短期用电负荷预测、短期用电负荷预测和中长期用电负荷预测,参考图4,图5,图6分别显示了超短期、短期和中长期用电负荷预测与用户真实用电负荷的对比折线图。Step 4: The prediction of power load mainly includes: ultra-short-term power load prediction, short-term power load prediction and medium- and long-term power load prediction. Refer to Figures 4, 5 and 6, which respectively show the comparison line graphs of ultra-short-term, short-term and medium- and long-term power load predictions with the actual power load of users.

图4中(a)为2019年某地五一商业用电负荷预测值与真实值的对比;通过分析该地某一天的用电真实值和预测值,可以得到该地这天的商业用电预测值基本能够反应真实用电情况,可以为发电厂提供调度指导。(b)为2018/2019/2020年该地五一商业用电负荷对比,通过分析2018/2019/2020年该地五一商业用电负荷对比,分析得到2020年五一该地商业用电负荷较前两年受疫情影响较大。Figure 4 (a) shows the comparison between the predicted value and the actual value of the commercial electricity load on May 1st in a certain place in 2019. By analyzing the actual value and predicted value of electricity consumption on a certain day in the place, it can be obtained that the predicted value of commercial electricity consumption on that day can basically reflect the actual electricity consumption situation, which can provide scheduling guidance for power plants. (b) is the comparison of the commercial electricity load on May 1st in the place in 2018/2019/2020. By analyzing the comparison of the commercial electricity load on May 1st in the place in 2018/2019/2020, it is found that the commercial electricity load on May 1st in 2020 was more affected by the epidemic than in the previous two years.

图5为某地域商业用电的短期用电负荷预测值与真实值的对比示折线图,引入温度因素对2019年该地商业用电进行预测,预测值基本能够反应真实负荷用电趋势,也负荷温度的变化趋势,但在2月5入左右出现异常情况,原因是春节期间该地商业大部分处于休假状态。Figure 5 is a line graph comparing the short-term electricity load forecast value and the actual value of commercial electricity consumption in a certain area. The temperature factor is introduced to predict the commercial electricity consumption in this area in 2019. The forecast value can basically reflect the actual load electricity consumption trend and the change trend of load temperature. However, an abnormal situation occurred around February 5th because most of the businesses in this area were on holiday during the Spring Festival.

图6为某地域商业用电的中长期用电负荷预测值与真实值的对比折线图。(a)为该地每周用电负荷预测;通过对比该地商业用电每周用电负荷值与预测值,预测值基本能够反应真实值的变化趋势。(b)为该地每月用电负荷预测,通过对比该地商业用电每月用电负荷真实值与预测值,预测值基本能够反应真实值的变化趋势。Figure 6 is a line chart comparing the medium- and long-term commercial electricity load forecast and the actual value. (a) is the weekly electricity load forecast for the area; by comparing the weekly commercial electricity load value and the forecast value, the forecast value can basically reflect the changing trend of the actual value. (b) is the monthly electricity load forecast for the area; by comparing the actual value and the forecast value of the monthly commercial electricity load, the forecast value can basically reflect the changing trend of the actual value.

具体步骤如下:The specific steps are as follows:

以某市商业用电为例,用电负荷预测模块的设计与实现主要从电厂发电部门的层面从超短期、短期和中长期多个维度进行用电负荷的预测,方便电厂发电部门掌握某市每天、每周、每月商业用电情况,能够做出有效的电力负荷调度。例如获取2019年某市商业用电时刻、天、周、月多个维度的用电负荷数据,及环境数据例如每天的最高温度、最低温度、平均湿度、平均风力大小、天气类型等,在数据预处理和标准化之后作为LSTM模型的输入,输出2019年某市商业用电时刻、天、周、月多个维度的预测用电负荷趋势,通过对比真实值和预测值,反映模型对用电负荷预测的准确度,并且引入温度等因素的变化趋势和用电趋势的对比反映环境因素对用电负荷大小的影响。Taking the commercial electricity consumption of a certain city as an example, the design and implementation of the power load forecasting module mainly predicts the power load from the ultra-short-term, short-term and medium-term dimensions from the level of the power plant power generation department, so as to facilitate the power plant power generation department to grasp the commercial electricity consumption of a certain city every day, every week and every month, and to make effective power load dispatch. For example, the power load data of the commercial electricity consumption time, day, week and month of a certain city in 2019, and environmental data such as the highest temperature, lowest temperature, average humidity, average wind force, weather type, etc. of each day are obtained. After data preprocessing and standardization, they are used as the input of the LSTM model, and the predicted power load trends of the commercial electricity consumption time, day, week and month of a certain city in 2019 are output. By comparing the real value and the predicted value, the accuracy of the model in predicting the power load is reflected, and the comparison of the changing trend of factors such as temperature and the power consumption trend is introduced to reflect the impact of environmental factors on the power load size.

本发明主要应用于电力部门,如指导电厂用于发电电力调度、协助电力管理部门分析用电情况,挖掘用户用电信息、可适用于研发部门,挖掘用户用电规律,提高电力部门工作效率,参考图7显示了本发明的应用方案。影响电网用电负荷有地域、时间、温度、用电类型、电压类型等因素,本作品设计的基于深度学习的多因素用电负荷预测系统,首先使用基于k-邻近(KNN)算法和改进DBSCAN算法的异常数据检测、自回归插值和序列数据归一化对数据进行预处理和标准化;然后提出一种改进的CNN-LSTM用电负荷预测模型,首先采用CNN特性提取模块对输入数据的局部特征进行学习;然后输入到LSTM序列学习模型中,提取输入数据的序列特征信息;同时将self-attention机制引入到LSTM中用于学习LSTM隐藏层的特征,通过分配不同的注意力权重实现关键特制的提取,以提升最终的预测精度;最后进行超短期用电负荷预测、短期用电负荷预测和中长期用电负荷预测并辅助对疫情前后用电分析、电力用电分布分析和电力用户画像等。本发明可应用于电力部门预测用户的用电趋势,合理规划阶段性用电,可以帮助电力部门调度电厂发电量,并且可以帮助指导行业的复工复产和行业的关联分析。The present invention is mainly used in the power sector, such as guiding power plants for power generation and dispatching, assisting power management departments in analyzing power consumption, mining user power consumption information, and can be applied to research and development departments to mine user power consumption patterns and improve the work efficiency of power departments. Reference Figure 7 shows the application scheme of the present invention. Factors that affect the power load of the power grid include region, time, temperature, power type, voltage type, etc. This work designs a multi-factor power load forecasting system based on deep learning. First, the data is preprocessed and standardized using abnormal data detection, autoregressive interpolation and sequence data normalization based on the k-nearest neighbor (KNN) algorithm and the improved DBSCAN algorithm; then an improved CNN-LSTM power load forecasting model is proposed. First, the CNN feature extraction module is used to learn the local features of the input data; then it is input into the LSTM sequence learning model to extract the sequence feature information of the input data; at the same time, the self-attention mechanism is introduced into the LSTM to learn the features of the LSTM hidden layer, and the key special extraction is achieved by assigning different attention weights to improve the final prediction accuracy; finally, ultra-short-term power load forecasting, short-term power load forecasting and medium- and long-term power load forecasting are carried out, and power consumption analysis before and after the epidemic, power consumption distribution analysis and power user portraits are assisted. The present invention can be applied to the power department to predict the power consumption trend of users, reasonably plan the phased power consumption, help the power department to dispatch the power generation of power plants, and can help guide the resumption of work and production in the industry and the correlation analysis of the industry.

Claims (5)

1. A multi-factor power load prediction method based on deep learning is characterized by comprising the following steps:
step 1: acquiring power load data including different areas, years and power utilization categories and environmental influence data including temperature, humidity and wind power, and storing the data in a sqlite database;
step 2: the method for cleaning and preprocessing the power load data and the environmental impact data comprises the following steps: abnormal data detection, autoregressive interpolation and sequence data normalization based on a k-proximity algorithm and an improved DBSCAN algorithm;
and step 3: introducing a self-attention mechanism, performing feature reconstruction on the hidden layer of the LSTM to realize end-to-end deployment, constructing an electrical load data prediction neural network model based on the improved CNN-LSTM, and using the electrical load data and the environmental impact data obtained by processing in the step (2) as a training set and a test set;
the step 3 specifically includes:
step 3.1: for the loading and preparation of network model data:
loading a data set, dividing a training set, a verification set and a test set, constructing a Dataloader as a data reader for the training set and the test set respectively, calculating data of each batch during model training, loading the data into a memory by the Dataloader according to the size of the batch, and disordering the data of each batch to improve the robustness of the model training;
step 3.2: constructing an electrical load data prediction neural network model based on the improved CNN-LSTM:
the neural network model comprises a CNN feature learning module, an LSTM sequence learning module and an attention mechanism module:
1) The CNN feature learning module comprises three one-dimensional convolution layers, wherein a Max painting layer and a ReLu layer are added between two continuous convolution layers; learning the characteristics of the normalized power load data and the normalized environmental impact data through convolution operation, and using the characteristics as a characteristic diagram output by the convolution layer; adding a MaxPolling layer to relieve the limitation of invariance of the generated feature diagram, and activating a function ReLu to enhance the capability of the model to learn a complex structure;
2) The LSTM sequence learning module comprises three LSTM layers, each layer comprising twenty neurons; the first two LSTM layers output the complete sequence of the hidden state, and the last LSTM layer outputs the last time step of the hidden state;
the input of the LSTM layer is an influencing parameter x at the time t t And the predicted value h of the previous moment t-1 Obtaining the predicted value h at the time t through the prediction function F t (ii) a The function expression is as follows:
h t =F(x t ,h t-1 )
3) A self-attention mechanism module is respectively added among the three LSTM layers and distributes weight to the features of the hidden layer extracted from the LSTM layer, so that electricity is used for miningLoad data and environmental impact data have more discriminative characteristics; hidden layer output sequence of penultimate layer of LSTM network at t moment is distributed from attention weight w tl The expression is as follows:
Figure FDA0004119822040000011
in the formula, L h For the sequence length of the LSTM hidden layer output, l denotes the sequence number of the LSTM hidden layer output sequence, s tl Representing the direct similarity of the sequence l of the LSTM hidden layer at the time t and other sequences;
the feature h of the sequence l tl Weight w corresponding thereto tl Multiplying to form a new signature sequence h t ' l And input into the next LSTM;
step 3.3: defining an optimizer, setting model training parameters:
monitoring verification loss by using the average absolute error as a loss function, and setting a self-adaptive optimizer Adam to adaptively update the learning rate in the training process; setting training iteration times epoch, an initial loss function and the size of batch;
and 4, step 4: predicting the electric load through an improved electric load data prediction neural network model of the CNN-LSTM;
the electric load data comprises a user number, a region to which the user belongs, an urban/rural network, a user classification, an electricity utilization type, a client type, a voltage grade, an industry type, a contract capacity, an operation capacity, a first power transmission date, a frozen electric quantity date and a frozen electric quantity of 15 minutes; the environmental impact data comprises the current date, the region, the highest temperature, the lowest temperature, the average humidity, the average wind power, the weather type, the current temperature of 15 minutes and whether the current temperature is a holiday or not; and the time of the environmental impact data is synchronized with the time of the electrical load data.
2. The method for predicting the electrical load with the multiple factors based on the deep learning according to claim 1, wherein in the step 2, the abnormal data detection based on the k-neighbor algorithm and the improved DBSCAN algorithm is specifically as follows:
step 2.1: defining the average distance between a sample and an adjacent sample as the abnormal score of the sample, obtaining the abnormal score of the electricity load data at each moment by using a modified KNN algorithm, and taking the total distance from each cluster to each k adjacent samples as the final abnormal score;
load data c of electricity consumption at a certain time i i K adjacent sets N of k (c i ) Expressed as:
Figure FDA0004119822040000021
Figure FDA0004119822040000022
in the formula (I), the compound is shown in the specification,
Figure FDA0004119822040000023
denotes c i One of the k neighboring points of (a), d k (c i ) Is c i K adjacent average distances of->
Figure FDA0004119822040000024
Denotes c i And its adjacent point->
Figure FDA0004119822040000025
The distance of (d);
electrical load data c i Is expressed as:
Figure FDA0004119822040000026
in the formula, N k (c i ) Is shown by c i K sets of neighboring points;
finally, outputting the first m clusters of the abnormal score ranking list as abnormal values of the electric load data;
step 2.2: adopting an improved cluster-based anomaly detection algorithm DBSCAN, firstly utilizing local parameters to realize density clustering of data of small samples, then carrying out iterative clustering on local clustering results to realize a final global clustering result, and marking points which do not belong to any cluster class and belong to anomaly points; the method specifically comprises the following steps:
step a) updating parameters:
setting the size M of a cluster sliding window, calculating the average distance difference of the electric load data in the window, setting the former k adjacent electric loads as MinPst, and setting the Euclidean distance between the electric load data as Eps;
setting a weight for each load data to reduce the impact on the final clustering result, weight w (c) i ,c j ) The calculation formula of (c) is as follows:
Figure FDA0004119822040000031
Figure FDA0004119822040000032
wherein, cov (c) i ,c j ) Electrical load data c at time i i And the electric load data c at the time of j j Covariance of (a), var (c) i ) Electrical load data c at time i i Variance of (c), var (c) j ) Electrical load data c for time j j The variance of (a); r (c) i ,c j ) Is denoted by c i And c j The smaller the value, the greater the correlation;
step b) obtaining an abnormal detection result of the electric load data by analyzing the clustering result:
in the clustering process, setting the abnormal point marked for the first time as a candidate abnormal point, setting the abnormal score plus 1, entering the next candidate abnormal point in the cyclic iterative clustering process, and updating the abnormal score; if the anomaly score S is equal to the cluster number C, the anomaly point is marked.
3. The method for predicting the multi-factor electrical load based on the deep learning of claim 1, wherein in the step 2, the autoregressive interpolation is specifically as follows:
completing the missing electric load data value by adopting a Lagrange interpolation method so that the polynomial y = a of n-1 0 +a 1 x+a 2 x 2 +…+a n-1 x n-1 Coordinate (x) passing through n points 1 ,y 1 ),(x 2 ,y 2 ),(x 3 ,y 3 ),...,(x n ,y n ) Then the functional expression for lagrange interpolation is expressed as:
Figure FDA0004119822040000033
in the formula, x i And x j I and j time instants, y, representing the power consumers, respectively i A power load indicating an i-th time of the power consumer; n represents the total number of times of the electrical load.
4. The method for predicting the electrical load according to claim 1, wherein in the step 2, the processing formula of the normalization of the sequence data is as follows:
Figure FDA0004119822040000041
wherein X = { X 1 ,X 2 ,X 3 ...,X N Denotes the same category of electrical load data or environmental impact data; the normalized value is Y = { Y = { (Y) } 1 ,Y 2 ,Y 3 …,Y N },l∈[1,N]And N is the total number of certain types of electrical load data or environmental impact data.
5. The method for predicting the multifactor electric load based on the deep learning of claim 1, wherein in the step 4, the prediction of the electric load comprises: ultra-short term power load prediction, and medium-long term power load prediction.
CN202111232185.5A 2021-10-22 2021-10-22 A Multi-factor Power Load Forecasting Method Based on Deep Learning Active CN113962364B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111232185.5A CN113962364B (en) 2021-10-22 2021-10-22 A Multi-factor Power Load Forecasting Method Based on Deep Learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111232185.5A CN113962364B (en) 2021-10-22 2021-10-22 A Multi-factor Power Load Forecasting Method Based on Deep Learning

Publications (2)

Publication Number Publication Date
CN113962364A CN113962364A (en) 2022-01-21
CN113962364B true CN113962364B (en) 2023-04-18

Family

ID=79466128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111232185.5A Active CN113962364B (en) 2021-10-22 2021-10-22 A Multi-factor Power Load Forecasting Method Based on Deep Learning

Country Status (1)

Country Link
CN (1) CN113962364B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707418B (en) * 2022-04-21 2024-10-29 大连大学 Cherry cracking environment parameter prediction method fused with GA-Attention-LSTM algorithm
CN115018119A (en) * 2022-04-26 2022-09-06 河北大学 Power load prediction method and system
CN114970693B (en) * 2022-05-11 2024-05-07 国网上海市电力公司 A method for user profiling of charging piles based on federated learning
CN114638440B (en) * 2022-05-17 2022-08-09 国网湖北省电力有限公司经济技术研究院 An ultra-short-term prediction method of charging load based on the utilization of charging piles
CN114676941B (en) * 2022-05-30 2022-09-30 国网天津市电力公司经济技术研究院 Electric-heat load joint adaptive prediction method and device for integrated energy system in the park
CN115271463B (en) * 2022-07-29 2023-06-02 昂顿科技(上海)有限公司 Intelligent energy management system
CN115481788B (en) * 2022-08-31 2023-08-25 北京建筑大学 Phase change energy storage system load prediction method and system
CN115587617A (en) * 2022-10-17 2023-01-10 国网黑龙江省电力有限公司 Method and device for capturing sequence information of long-time sequence and method and device for predicting short-term power consumption
CN115412923B (en) * 2022-10-28 2023-02-03 河北省科学院应用数学研究所 Multi-source sensor data credible fusion method, system, equipment and storage medium
CN115936185B (en) * 2022-11-15 2024-07-26 国网江苏省电力有限公司苏州供电分公司 Short-term power load and carbon emission prediction method and system based on DCNN-LSTM-AE-AM
CN116205666B (en) * 2022-12-22 2024-08-13 国网湖北省电力有限公司宜昌供电公司 RACNet-based multivariable power load prediction method
CN116128124A (en) * 2023-01-09 2023-05-16 北京建筑大学 A Building Energy Consumption Prediction Method Based on Abnormal Emergy Processing and Time Series Decomposition
CN116562168B (en) * 2023-06-09 2024-05-14 岳阳融盛实业有限公司 Electric power informatization data mining system and method based on deep learning
CN116402483B (en) * 2023-06-09 2023-08-18 国网山东省电力公司兰陵县供电公司 Online monitoring method and system for carbon emission of park
CN116523277B (en) * 2023-07-05 2023-10-20 北京观天执行科技股份有限公司 Intelligent energy management method and system based on demand response
CN117171700B (en) * 2023-08-10 2024-06-28 辽宁石油化工大学 A combined model for drilling overflow prediction based on deep learning and a method for timely silent model update and transfer learning
CN116932487B (en) * 2023-09-15 2023-11-28 北京安联通科技有限公司 Quantized data analysis method and system based on data paragraph division
CN117009861B (en) * 2023-10-08 2023-12-15 湖南国重智联工程机械研究院有限公司 Hydraulic pump motor life prediction method and system based on deep learning
CN117035837B (en) * 2023-10-09 2024-01-19 广东电力交易中心有限责任公司 Method for predicting electricity purchasing demand of power consumer and customizing retail contract
CN117111540B (en) * 2023-10-25 2023-12-29 南京德克威尔自动化有限公司 Environment monitoring and early warning method and system for IO remote control bus module
CN117350439B (en) * 2023-11-16 2024-11-15 华南理工大学 Energy aggregation service provider load prediction method and system based on transverse federal learning
CN117689278B (en) * 2024-02-04 2024-08-20 新疆盛诚工程建设有限责任公司 Construction quality intelligent management system and method
CN117878927B (en) * 2024-03-11 2024-05-28 国网黑龙江省电力有限公司绥化供电公司 Power system load trend analysis method based on time sequence analysis
CN117895659B (en) * 2024-03-14 2024-05-31 山东理工大学 Automatic scheduling method and system for smart power grid
CN118428934B (en) * 2024-05-29 2024-10-11 山东浪潮智慧能源科技有限公司 Peak valley arbitrage 5G base station energy storage operation and maintenance management system and method
CN118376284B (en) * 2024-06-25 2024-09-06 青岛他坦科技服务有限公司 Multi-mode monitoring data optimized transmission method for Internet of things
CN118709871B (en) * 2024-08-30 2024-11-12 国网福建省电力有限公司营销服务中心 User-level power prediction method and system based on knowledge graph retrieval enhanced generation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934392A (en) * 2019-02-28 2019-06-25 武汉大学 A short-term load forecasting method for microgrid based on deep learning
CN110659779A (en) * 2019-09-26 2020-01-07 国网湖南省电力有限公司 A Loss Prediction Method of Distribution System Based on Long Short-Term Memory Network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102128363B1 (en) * 2018-11-02 2020-06-30 경희대학교 산학협력단 System of management of energy trading and method of the same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934392A (en) * 2019-02-28 2019-06-25 武汉大学 A short-term load forecasting method for microgrid based on deep learning
CN110659779A (en) * 2019-09-26 2020-01-07 国网湖南省电力有限公司 A Loss Prediction Method of Distribution System Based on Long Short-Term Memory Network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Vasileios Tosounidis.Wind Power Forecasting with Deep Learning Methods.https://ikee.lib.auth.gr/record/343548/files/GRI-2022-37466.pdf.2022,全文. *
Xinyi Wang 等.Self-Attention based Neural Network for Predicting RNA-Protein Binding Sites.IEEE/ACM Transactions on Computational Biology and Bioinformatics.2022,1-12. *
闫建荣.基于数据增强的RNA-蛋白质相互作用预测研究.中国优秀硕士学位论文全文数据库 (基础科学辑).2022,A006-72. *

Also Published As

Publication number Publication date
CN113962364A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
CN113962364B (en) A Multi-factor Power Load Forecasting Method Based on Deep Learning
Massaoudi et al. A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting
Azeem et al. Electrical load forecasting models for different generation modalities: a review
CN110705743B (en) New energy consumption electric quantity prediction method based on long-term and short-term memory neural network
US20210326696A1 (en) Method and apparatus for forecasting power demand
CN111915092B (en) Ultra-short-term wind power forecasting method based on long-short-term memory neural network
CN106022549A (en) Short term load predication method based on neural network and thinking evolutionary search
CN113554466A (en) Construction method, forecasting method and device for short-term electricity consumption forecasting model
CN114862032B (en) XGBoost-LSTM-based power grid load prediction method and device
Al-Ja’afreh et al. An enhanced CNN-LSTM based multi-stage framework for PV and load short-term forecasting: DSO scenarios
CN116703644A (en) A short-term power load forecasting method based on Attention-RNN
CN116011655A (en) Ultra-short-term load forecasting method and system based on two-stage intelligent feature engineering
Zuo Integrated forecasting models based on LSTM and TCN for short-term electricity load forecasting
CN114091776A (en) K-means-based multi-branch AGCNN short-term power load prediction method
CN115759415A (en) Electricity Demand Forecasting Method Based on LSTM-SVR
Wu et al. Interpretable wind speed forecasting with meteorological feature exploring and two-stage decomposition
Agga et al. Short-term load forecasting: based on hybrid CNN-LSTM neural network
Mishra et al. Performance evaluation of prophet and STL-ETS methods for load forecasting
CN114581141B (en) Short-term load forecasting method based on feature selection and LSSVR
Li et al. Solar photovoltaic power forecasting system with online manner based on adaptive mode decomposition and multi-objective optimization
CN112529268A (en) Medium-short term load prediction method and device based on manifold learning
Wang et al. Load forecasting method based on CNN and extended LSTM
CN117477561A (en) Residential household load probability prediction method and system
CN117937432A (en) A method and system for power load forecasting based on knowledge distillation algorithm
Wang et al. A novel data-driven method with decomposition mechanism suitable for different periods of electrical load forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant