CN115115125B

CN115115125B - Photovoltaic power interval probability prediction method based on deep learning fusion model

Info

Publication number: CN115115125B
Application number: CN202210823439.9A
Authority: CN
Inventors: 王开艳; 杜浩东; 贾嵘; 王颂凯; 刘恒
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2024-08-02
Anticipated expiration: 2042-07-13
Also published as: CN115115125A

Abstract

The invention discloses a photovoltaic power interval probability prediction method based on a deep learning fusion model, which specifically comprises the following steps: firstly, obtaining a photovoltaic power and meteorological factors, performing variable correlation analysis, and determining an input variable of a prediction model; selecting a clustering variable, constructing the statistical characteristics of the clustering variable, adopting a fuzzy C-means clustering algorithm to perform similar daily clustering of the photovoltaic historical data, and performing normalization processing on the similar daily clustering; then dividing the similar daily data set into a training set and a testing set; building a QR-CNN-BiLSTM interval prediction model, training, and predicting a photovoltaic power interval; and finally, generating a photovoltaic probability prediction result on the test set. The method can well track future photovoltaic power trend, realize high accuracy measurement of photovoltaic power prediction uncertainty on the basis of meeting reliability requirements, generate a photovoltaic power prediction interval under a corresponding confidence level, and have practical application value.

Description

Photovoltaic power interval probability prediction method based on deep learning fusion model

技术领域Technical Field

本发明属于光伏发电预测技术领域，具体涉及基于深度学习融合模型的光伏功率区间概率预测方法。The present invention belongs to the technical field of photovoltaic power generation prediction, and specifically relates to a photovoltaic power interval probability prediction method based on a deep learning fusion model.

背景技术Background technique

近年来，环境污染问题日渐严重，不可再生能源匮乏问题日渐凸显，世界各国开始寻求新的能源发展道路，我国在2014年开始实施新能源发展战略，作为能源转型重要组成部分的太阳能开始得到大规模发展和利用，截止2021年底，我国光伏发电总装机已经达到3.06亿kW，2021年一年全国光伏新增装机5300万kW。以光伏发电为代表的大规模新能源并网发电是未来新一代电力系统不可阻挡的发展趋势和其突出特征。然而，光伏发电由于受到多种复杂环境因素的影响，造成光伏发电功率存在较强的随机波动性、间歇性以及非平稳性，随着电力系统中高比例光伏发电的接入，其作为一个不可控的电源会严重威胁到电力系统的安全稳定运行。因此，研究光伏功率预测技术，对于我国构建新一代电力系统，使其适应高比例可再生能源的接入具有重要的意义，同时对构建电力系统综合安全防御体系、实现风险控制具有重要价值。In recent years, environmental pollution has become increasingly serious, and the problem of shortage of non-renewable energy has become increasingly prominent. Countries around the world have begun to seek new energy development paths. my country began to implement the new energy development strategy in 2014. As an important part of energy transformation, solar energy has begun to be developed and utilized on a large scale. By the end of 2021, my country's total installed capacity of photovoltaic power generation has reached 306 million kW, and the country's new photovoltaic installed capacity in 2021 was 53 million kW. Large-scale new energy grid-connected power generation represented by photovoltaic power generation is an unstoppable development trend and outstanding feature of the new generation of power systems in the future. However, photovoltaic power generation is affected by a variety of complex environmental factors, resulting in strong random volatility, intermittency and non-stationarity in photovoltaic power generation. With the access of a high proportion of photovoltaic power generation in the power system, it will seriously threaten the safe and stable operation of the power system as an uncontrollable power source. Therefore, the study of photovoltaic power prediction technology is of great significance for my country to build a new generation of power systems and adapt them to the access of a high proportion of renewable energy. At the same time, it is of great value to build a comprehensive security defense system for the power system and realize risk control.

现有的光伏功率预测技术从预测的结果形式上分类，可以分为确定性预测和不确定性预测。光伏功率确定性预测结果为单点预测，优点是较为直观，但是无法表征光伏功率预测的不确定性信息。不确定性预测能够给出未来时刻光伏功率可能的变化范围以及可置信程度，不确定性预测结果能够为电力系统调度提供更加全面的数据支撑，具有更加重要的工程意义。从预测模型上分类，可以分为物理模型和数据驱动模型，物理模型从光伏组件特性、安装角度等出发建立模型，同时考虑到地理条件和气象要素等，物理模型构建机理复杂，目前应用较少。数据驱动模型主要包括统计学方法和人工智能算法。统计学方法采用曲线拟合和参数估计建立光伏功率和其影响因素间的联系，常用的有时间序列法和灰色模型。人工智能模型以神经网络和深度学习模型为代表，人工智能模型具有强大的非线性数据处理能力，是近年来普遍采用的模型。Existing photovoltaic power prediction technologies can be classified into deterministic prediction and uncertain prediction based on the form of prediction results. The deterministic prediction result of photovoltaic power is a single-point prediction, which has the advantage of being more intuitive, but cannot characterize the uncertainty information of photovoltaic power prediction. Uncertainty prediction can give the possible range of changes and confidence of photovoltaic power at future moments. The uncertainty prediction result can provide more comprehensive data support for power system dispatching and has more important engineering significance. From the perspective of prediction models, it can be divided into physical models and data-driven models. Physical models are established based on the characteristics of photovoltaic components, installation angles, etc., while taking into account geographical conditions and meteorological factors. The construction mechanism of physical models is complex and is currently less used. Data-driven models mainly include statistical methods and artificial intelligence algorithms. Statistical methods use curve fitting and parameter estimation to establish the relationship between photovoltaic power and its influencing factors. Commonly used methods include time series method and gray model. Artificial intelligence models are represented by neural networks and deep learning models. Artificial intelligence models have strong nonlinear data processing capabilities and are widely used in recent years.

尽管已经有很多学者在光伏功率预测领域做了大量工作，但是现阶段仍旧存在以下问题：(1)光伏功率预测多集中在点预测，区间概率预测研究较少。(2)现有区间概率预测模型的可靠性和敏锐性不高，光伏功率区间概率预测模型的性能亟需进一步提升。(3)多数研究采用的算例数据都是以1h或者15min为间隔，然而当涉及到预测未来以5min为间隔的功率时，将面临更加复杂多变的光伏功率波动，传统的单一模型将无法很好地应对这一难题，而多模型融合将是未来的解决方案之一。Although many scholars have done a lot of work in the field of photovoltaic power prediction, the following problems still exist at this stage: (1) Photovoltaic power prediction is mostly concentrated on point prediction, and there is little research on interval probability prediction. (2) The reliability and sensitivity of the existing interval probability prediction model are not high, and the performance of the photovoltaic power interval probability prediction model needs to be further improved. (3) The example data used in most studies are based on 1h or 15min intervals. However, when it comes to predicting future power at 5min intervals, it will face more complex and changeable photovoltaic power fluctuations. The traditional single model will not be able to cope with this problem well, and multi-model fusion will be one of the future solutions.

发明内容Summary of the invention

本发明的目的是提供基于深度学习融合模型的光伏功率区间概率预测方法，解决了光伏功率区间预测和概率预测结果不准确的问题。The purpose of the present invention is to provide a photovoltaic power interval probability prediction method based on a deep learning fusion model, which solves the problem of inaccurate photovoltaic power interval prediction and probability prediction results.

本发明所采用的技术方案是，基于深度学习融合模型的光伏功率区间概率预测方法，具体按照以下步骤实施：The technical solution adopted by the present invention is a photovoltaic power interval probability prediction method based on a deep learning fusion model, which is specifically implemented according to the following steps:

步骤1、获取预处理后的气象要素数据和历史光伏功率数据，对光伏功率和气象因素进行变量相关性分析，确定预测模型的输入变量；Step 1: Obtain pre-processed meteorological element data and historical photovoltaic power data, perform variable correlation analysis on photovoltaic power and meteorological factors, and determine the input variables of the prediction model;

步骤2、选取聚类变量，构建聚类变量的统计特征；Step 2: Select clustering variables and construct statistical characteristics of clustering variables;

步骤3、根据步骤2选取的聚类变量及其统计特征，采用模糊C均值聚类算法进行光伏历史数据的相似日聚类，得到光伏相似日数据集；Step 3: Based on the clustering variables and their statistical characteristics selected in step 2, the fuzzy C-means clustering algorithm is used to cluster similar days of photovoltaic historical data to obtain a photovoltaic similar day data set;

步骤4、采用min-max归一化方法对光伏相似日数据集进行归一化处理；Step 4: Use the min-max normalization method to normalize the photovoltaic similar day data set;

步骤5、将各个天气情况下的相似日数据集划分为训练集和测试集；Step 5: Divide the similar day data sets under various weather conditions into training sets and test sets;

步骤6、构建QR-CNN-BiLSTM区间预测模型，设置CNN和BiLSTM模型的参数，设置模型训练的相关参数；将相似日数据集的训练集输入模型中进行训练；Step 6: Build the QR-CNN-BiLSTM interval prediction model, set the parameters of the CNN and BiLSTM models, and set the relevant parameters for model training; input the training set of the similar day data set into the model for training;

步骤7、将测试集的数据输入到步骤6中训练好的QR-CNN-BiLSTM区间预测模型中，进行光伏功率区间预测；Step 7: Input the test set data into the QR-CNN-BiLSTM interval prediction model trained in step 6 to perform photovoltaic power interval prediction;

步骤8、将区间预测结果再经过反归一化使其具有物理意义；Step 8: Denormalize the interval prediction results to make them physically meaningful;

步骤9、采用交叉验证和网格搜索方法优化的核密度估计算法，生成测试集上的光伏概率预测结果。Step 9: Use the kernel density estimation algorithm optimized by cross-validation and grid search methods to generate photovoltaic probability prediction results on the test set.

本发明的特点还在于，The present invention is also characterized in that:

步骤1中，具体为：In step 1, specifically:

步骤1.1、选取预处理后的气象要素数据和光伏发电功率数据；Step 1.1, select the pre-processed meteorological element data and photovoltaic power generation data;

气象要素变量和光伏功率变量的时间分辨率为5min，选取风速、相对湿度、环境温度、总水平辐射、漫射水平辐射、降雨量、风向、总倾斜辐射以及漫射倾斜辐射作为原始的气象要素数据变量；The time resolution of meteorological element variables and photovoltaic power variables is 5 min, and wind speed, relative humidity, ambient temperature, total horizontal radiation, diffuse horizontal radiation, rainfall, wind direction, total oblique radiation and diffuse oblique radiation are selected as the original meteorological element data variables;

步骤1.2，采用kendall秩相关系数R度量多个气象要素变量之间的相关性程度；Step 1.2, the Kendall rank correlation coefficient R is used to measure the degree of correlation between multiple meteorological element variables;

步骤1.3、选取光伏功率的kendall秩相关系数R绝对值不小于0.5的气象因素，将其作为预测模型输入。Step 1.3: Select meteorological factors whose absolute value of the Kendall rank correlation coefficient R of photovoltaic power is not less than 0.5 and use them as input to the prediction model.

步骤2中，具体为：In step 2, specifically:

步骤2.1、选取光伏功率的kendall秩相关系数值最高的气象因素变量作为聚类变量；Step 2.1, select the meteorological factor variable with the highest kendall rank correlation coefficient value of photovoltaic power as the clustering variable;

步骤2.2、选取该聚类变量的平均值、标准差、最大值、峰波谷数、变异系数、峰度和偏度作为统计特征。Step 2.2: Select the mean, standard deviation, maximum value, number of peaks and troughs, coefficient of variation, kurtosis and skewness of the clustering variable as statistical features.

步骤3中，具体为：In step 3, specifically:

步骤3.1、根据步骤2构建的聚类变量的统计特征，计算各天聚类变量的7个统计特征的数值；Step 3.1, based on the statistical characteristics of the clustering variables constructed in step 2, calculate the values of the seven statistical characteristics of the clustering variables on each day;

步骤3.2、确定数据聚类类别个数c，初始化聚类中心V_i，给定模糊化参数m，初始化隶属度矩阵U⁽⁰⁾，给定算法的终止标准ε；Step 3.2, determine the number of data clustering categories c, initialize the cluster center V _i , give the fuzzification parameter m, initialize the membership matrix U ⁽⁰⁾ , and give the termination criterion ε of the algorithm;

步骤3.3、根据式(5)计算第t次迭代的所有聚类中心，得到聚类中心矩阵：Step 3.3: Calculate all cluster centers of the tth iteration according to formula (5) to obtain the cluster center matrix:

式中：u_ij为第i个样本属于第j类的隶属度；x_i为样本点；m是隶属度因子；n为样本数；t是迭代次数；c表示聚类个数；Where: _uij is the membership of the i-th sample to the j-th class; _xi is the sample point; m is the membership factor; n is the number of samples; t is the number of iterations; c represents the number of clusters;

步骤3.4、更新隶属度矩阵U^(t)，计算方法如式(6)所示：Step 3.4: Update the membership matrix U ^(t) . The calculation method is shown in formula (6):

式中：u_ij为第i个样本属于第j类的隶属度；x_i为样本点；m是隶属度因子；n为样本数；t是迭代次数；c表示聚类个数；d_ij是第i个样本到第j类聚类中心的距离；Where: u _ij is the membership of the ith sample to the jth class; _xi is the sample point; m is the membership factor; n is the number of samples; t is the number of iterations; c is the number of clusters; d _ij is the distance from the ith sample to the center of the jth cluster;

步骤3.5、计算||U^(t)-U^(t-1)||，并验证是否满足迭代停止条件||U^(t)-U^(t-1)||<ε，若满足条件停止迭代，否则继续重复步骤3.3和步骤3.4直到达到条件为止，最终得到各个天气情况下的相似日数据集。Step 3.5, calculate ||U ^(t) -U ^(t-1) ||, and verify whether the iteration stop condition ||U ^(t) -U ^(t-1) || < ε is met. If the condition is met, stop the iteration; otherwise, continue to repeat steps 3.3 and 3.4 until the condition is met, and finally obtain a similar daily data set under various weather conditions.

步骤6中，具体为：In step 6, specifically:

步骤6.1、将归一化后的相似日数据集通过滑动窗口的形式构建特征图输入到CNN网络中，利用其卷积层和池化层提取表征光伏功率动态变化的特征向量；Step 6.1: construct a feature map of the normalized similar day data set in the form of a sliding window and input it into the CNN network, and use its convolutional layer and pooling layer to extract the feature vector that characterizes the dynamic change of photovoltaic power;

步骤6.2、将输出的特征向量转化为时间序列输入到BiLSTM网络中；Step 6.2: Convert the output feature vector into a time series and input it into the BiLSTM network;

步骤6.3、引入QR模型，并和CNN-BiLSTM相结合，QR模型与CNN-BiLSTM模型以损失函数的形式进行融合；Step 6.3: Introduce the QR model and combine it with CNN-BiLSTM. The QR model and CNN-BiLSTM model are fused in the form of loss function.

设表示CNN-BiLSTM点预测模型，其中X_t为模型输入即自变量，Ω为CNN-BiLSTM的模型参数，Y_t为因变量，为Y_t的预测值；set up represents the CNN-BiLSTM point prediction model, where _Xt is the model input, i.e., the independent variable, Ω is the model parameter of CNN-BiLSTM, and _Yt is the dependent variable. is the predicted value of _Yt ;

则QR-CNN-BiLSTM模型可表示为各个分位点下的模型参数Ω(τ)的估计量是通过最小化损失函数获得的；Then the QR-CNN-BiLSTM model can be expressed as Estimator of model parameter Ω(τ) at each quantile By minimizing the loss function acquired;

步骤6.4、模型的参数设置与模型训练；Step 6.4: Model parameter setting and model training;

QR-CNN-BiLSTM模型网络结构由依次连接的一层CNN，一层最大池化层，三层BiLSTM，以及一层全连接层构成；The network structure of the QR-CNN-BiLSTM model consists of a layer of CNN, a layer of maximum pooling, three layers of BiLSTM, and a layer of fully connected layer.

设置卷积层数为1层，卷积核数量为64、卷积核大小为4、卷积的边界处理方式为'same'、激活函数为、池化窗口大小为3；Set the number of convolution layers to 1, the number of convolution kernels to 64, the convolution kernel size to 4, the convolution boundary processing method to 'same', the activation function to , and the pooling window size to 3;

设置BiLSTM网络层数为3，神经元个数为128，dropout参数为0.2；设置分位点起步为0.05，步长为0.05，终点为1，因此全连接层神经元个数为19；Set the number of BiLSTM network layers to 3, the number of neurons to 128, and the dropout parameter to 0.2; set the starting point of the quantile to 0.05, the step size to 0.05, and the end point to 1, so the number of neurons in the fully connected layer is 19;

设置初始学习率为0.01、学习率衰减为1.5、学习率最小值为10^-4、最大迭代次数为100、批处理参数为32、优化器为Adam；Set the initial learning rate to 0.01, the learning rate decay to 1.5, the minimum learning rate to 10 ^-4 , the maximum number of iterations to 100, the batch parameter to 32, and the optimizer to Adam;

将相似日数据集的训练集输入构建的QR-CNN-BiLSTM模型中进行模型训练。The training set of the similar day data set is input into the constructed QR-CNN-BiLSTM model for model training.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明基于深度学习融合模型的光伏功率区间概率预测方法，提出基于kendall秩相关系数的相关性分析方法，可以确定模型输入变量，减少历史数据中的无效信息以提升训练效率；选取与光伏功率存在高相关性的气象因素变量作为聚类变量，选取聚类变量的平均值等7个统计特征作为聚类特征以全面地反映各天历史数据的波动规律和特点，以便于FCM算法进行高效聚类，该聚类算法高效合理；QR-CNN-BiLSTM模型融合了CNN及BiLSTM两种深度学习模型，较传统的单一深度学习预测模型具有更高的预测精度，在以5min为间隔的精细时间粒度预测未来光伏功率时，非晴天天气情况下以5min为间隔的光伏功率呈现更加急剧的变化特征，能更好地跟踪未来的光伏功率趋势，在满足可信度要求基础上，实现光伏功率预测不确定性的高准确度度量，生成相应置信度水平下的光伏功率预测区间，具有实际应用价值。The present invention is based on a photovoltaic power interval probability prediction method based on a deep learning fusion model, and proposes a correlation analysis method based on the kendall rank correlation coefficient, which can determine the model input variables and reduce the invalid information in the historical data to improve the training efficiency; the meteorological factor variables with high correlation with the photovoltaic power are selected as clustering variables, and 7 statistical features such as the average value of the clustering variables are selected as clustering features to comprehensively reflect the fluctuation law and characteristics of the historical data of each day, so that the FCM algorithm can perform efficient clustering, and the clustering algorithm is efficient and reasonable; the QR-CNN-BiLSTM model integrates two deep learning models, CNN and BiLSTM, and has higher prediction accuracy than the traditional single deep learning prediction model. When predicting the future photovoltaic power at a fine time granularity of 5 minutes, the photovoltaic power at an interval of 5 minutes in non-sunny weather presents a more rapid change feature, which can better track the future photovoltaic power trend, and on the basis of meeting the credibility requirements, realizes a high-accuracy measurement of the uncertainty of photovoltaic power prediction, and generates a photovoltaic power prediction interval at a corresponding confidence level, which has practical application value.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明基于深度学习融合模型的光伏功率区间概率预测方法的流程图；FIG1 is a flow chart of a photovoltaic power interval probability prediction method based on a deep learning fusion model of the present invention;

图2是本发明基于深度学习融合模型的光伏功率区间概率预测方法中模糊C均值聚类算法的流程图；FIG2 is a flow chart of a fuzzy C-means clustering algorithm in a photovoltaic power interval probability prediction method based on a deep learning fusion model of the present invention;

图3是本发明方法中采用QR-CNN-BiLSTM模型在晴天时的区间预测结果图；FIG3 is a diagram showing interval prediction results using the QR-CNN-BiLSTM model in the method of the present invention on a sunny day;

图4是采用QR-LSTM模型在晴天时的区间预测结果图；Figure 4 is a graph showing the interval prediction results using the QR-LSTM model on sunny days;

图5是采用QR-BiLSTM模型在晴天时的区间预测结果图；Figure 5 is a graph showing the interval prediction results using the QR-BiLSTM model on sunny days;

图6是本发明方法中采用QR-CNN-BiLSTM模型在晴转多云天时的区间预测结果图；FIG6 is a diagram showing interval prediction results when the weather changes from sunny to cloudy using the QR-CNN-BiLSTM model in the method of the present invention;

图7是采用QR-LSTM模型在晴转多云天时的区间预测结果图；Figure 7 is a graph showing the interval prediction results when the weather changes from sunny to cloudy using the QR-LSTM model;

图8是采用QR-BiLSTM模型在晴转多云天时的区间预测结果图；Figure 8 is a graph showing the interval prediction results of the QR-BiLSTM model when the weather changes from sunny to cloudy;

图9是本发明方法中采用QR-CNN-BiLSTM模型在阴雨天时的区间预测结果图；FIG9 is a diagram showing interval prediction results using the QR-CNN-BiLSTM model in the method of the present invention on rainy days;

图10是采用QR-LSTM模型在阴雨天时的区间预测结果图；Figure 10 is a graph showing the interval prediction results using the QR-LSTM model on rainy days;

图11是采用QR-BiLSTM模型在阴雨天时的区间预测结果图；Figure 11 is a graph showing the interval prediction results using the QR-BiLSTM model on rainy days;

图12是本发明方法的QR-CNN-BiLSTM模型结合核密度估计方法在晴天时的概率预测结果图；FIG12 is a graph showing the probability prediction results of the QR-CNN-BiLSTM model combined with the kernel density estimation method on a sunny day according to the method of the present invention;

图13是本发明方法的QR-CNN-BiLSTM模型结合核密度估计方法在晴转多云天时的概率预测结果图；FIG13 is a graph showing the probability prediction results of the QR-CNN-BiLSTM model combined with the kernel density estimation method when the weather changes from sunny to cloudy;

图14是本发明方法的QR-CNN-BiLSTM模型结合核密度估计方法在阴雨天时的概率预测结果图。FIG14 is a graph showing the probability prediction results of the QR-CNN-BiLSTM model of the method of the present invention combined with the kernel density estimation method on rainy days.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明进行详细说明。The present invention is described in detail below with reference to the accompanying drawings and specific embodiments.

本发明基于深度学习融合模型的光伏功率区间概率预测方法，如图1所示，具体按照以下步骤实施：The photovoltaic power interval probability prediction method based on the deep learning fusion model of the present invention is shown in FIG1 and is specifically implemented according to the following steps:

步骤1、获取预处理后的气象要素数据和历史光伏功率数据，对光伏功率和气象因素进行变量相关性分析，确定预测模型的输入变量；具体为：Step 1: Obtain pre-processed meteorological element data and historical photovoltaic power data, perform variable correlation analysis on photovoltaic power and meteorological factors, and determine the input variables of the prediction model; specifically:

步骤1.1、选取预处理后的气象要素数据和光伏发电功率数据，气象要素变量和光伏功率变量的时间分辨率保持一致；Step 1.1, select the pre-processed meteorological element data and photovoltaic power generation data, and keep the time resolution of meteorological element variables and photovoltaic power variables consistent;

气象要素变量和光伏功率变量的时间分辨率为5min，选取风速、相对湿度、环境温度、总水平辐射、漫射水平辐射、降雨量、风向、总倾斜辐射以及漫射倾斜辐射作为原始的气象要素数据变量，由于光伏阵列夜晚不发电，去除夜晚功率为0的数据，保留每天6:00—19:30时间段的数据作为实例分析数据。The time resolution of meteorological element variables and photovoltaic power variables is 5 minutes. Wind speed, relative humidity, ambient temperature, total horizontal radiation, diffuse horizontal radiation, rainfall, wind direction, total inclined radiation and diffuse inclined radiation are selected as the original meteorological element data variables. Since the photovoltaic array does not generate electricity at night, the data with zero power at night are removed, and the data from 6:00 to 19:30 every day are retained as example analysis data.

寻找历史气象数据和光伏发电功率数据的空缺值，用插值法对空缺值进行填充；采用箱型图寻找原始历史气象数据和光伏发电功率数据的异常值，并用箱型图的上下边界替换数据的异常值；Find the missing values of historical meteorological data and photovoltaic power generation data, and fill them with interpolation method; use box plot to find the outliers of original historical meteorological data and photovoltaic power generation data, and replace the outliers of data with the upper and lower boundaries of the box plot;

kendall秩相关系数R的定义如式(1)：The definition of Kendall rank correlation coefficient R is as follows:

式中：P表示一致对数目；Q表示非一致对数目，表示观察值总对数。当变量A和B的两对观察值A_i、B_i和A_j、B_j，满足A_i<B_i且此时A_j<B_j，则称这两对观察值一致或者和谐，否则为不一致或者不和谐；Where: P represents the number of consistent pairs; Q represents the number of inconsistent pairs, Represents the total number of observation logs. When two pairs of observations A _i , B _i and A _j , B _j of variables A and B satisfy A _i <B _i and A _j <B _j , the two pairs of observations are said to be consistent or harmonious, otherwise they are inconsistent or disharmonious;

kendall秩相关系数R越接近1，表明该气象变量与输出功率的相关性越高。该系数为正数，表明是正相关，为负数则表明是负相关；The closer the Kendall rank correlation coefficient R is to 1, the higher the correlation between the meteorological variable and the output power. A positive number indicates a positive correlation, while a negative number indicates a negative correlation.

步骤1.3、选取光伏功率的kendall秩相关系数R绝对值不小于0.5的气象因素，将其作为预测模型输入；Step 1.3, select meteorological factors whose absolute value of the kendall rank correlation coefficient R of photovoltaic power is not less than 0.5, and use them as input of the prediction model;

在本实例中，光伏功率变量和各个气象要素变量的kendall秩相关系数R如表1所示：In this example, the kendall rank correlation coefficient R between the photovoltaic power variable and each meteorological element variable is shown in Table 1:

表1气象因素变量Table 1 Meteorological factors variables

选取与光伏功率相关性较高的多个气象因素，即与光伏功率的kendall秩相关系数绝对值不小于0.5的气象因素，将其作为预测模型输入，因此选取总水平辐射、漫射水平辐射、总倾斜辐射、漫射倾斜辐射作为本实施例的气象要素变量；Select multiple meteorological factors with high correlation with photovoltaic power, that is, meteorological factors with an absolute value of the kendall rank correlation coefficient with photovoltaic power of not less than 0.5, and use them as inputs of the prediction model. Therefore, total horizontal radiation, diffuse horizontal radiation, total inclined radiation, and diffuse inclined radiation are selected as meteorological element variables of this embodiment;

步骤2、选取聚类变量，构建聚类变量的统计特征，具体为：Step 2: Select clustering variables and construct statistical characteristics of clustering variables, specifically:

步骤2.1、选取光伏功率的kendall秩相关系数值最高的气象因素变量作为聚类变量；本实施例中选取总水平辐射为聚类变量；Step 2.1, select the meteorological factor variable with the highest kendall rank correlation coefficient value of photovoltaic power as the clustering variable; in this embodiment, the total horizontal radiation is selected as the clustering variable;

步骤2.2、选取该聚类变量的平均值、标准差、最大值、峰波谷数、变异系数、峰度和偏度作为统计特征；Step 2.2, select the mean, standard deviation, maximum value, number of peaks and valleys, coefficient of variation, kurtosis and skewness of the clustering variable as statistical features;

变异系数C、峰度K和偏度D的定义分别如式(2)-(4)所示：The definitions of coefficient of variation C, kurtosis K and skewness D are shown in equations (2)-(4) respectively:

式中，σ表示变量标准差，表示变量平均值，X_i表示变量的某个样本，M表示变量的总样本数；In the formula, σ represents the standard deviation of the variable, represents the mean value of the variable, _Xi represents a sample of the variable, and M represents the total number of samples of the variable;

步骤3、根据步骤2选取的聚类变量及其统计特征，如图2所示，采用模糊C均值(FCM)聚类算法进行光伏历史数据的相似日聚类，得到光伏相似日数据集；具体为：Step 3: Based on the clustering variables and their statistical characteristics selected in step 2, as shown in Figure 2, the fuzzy C-means (FCM) clustering algorithm is used to cluster similar days of photovoltaic historical data to obtain a photovoltaic similar day data set; specifically:

步骤3.1、根据步骤2构建的聚类变量的统计特征，计算各天聚类变量的7个统计特征的数值，以此作为聚类数据以备采用模糊C均值(FCM)聚类算法进行聚类；Step 3.1, according to the statistical characteristics of the clustering variables constructed in step 2, calculate the values of the seven statistical characteristics of the clustering variables on each day, and use them as clustering data for clustering using the fuzzy C-means (FCM) clustering algorithm;

式中：u_ij为第i个样本属于第j类的隶属度；x_i为样本点；m是隶属度因子；n为样本数；t是迭代次数；c表示聚类个数。Where: _uij is the membership of the i-th sample to the j-th class; _xi is the sample point; m is the membership factor; n is the number of samples; t is the number of iterations; c represents the number of clusters.

式中：u_ij为第i个样本属于第j类的隶属度；x_i为样本点；m是隶属度因子；n为样本数；t是迭代次数；c表示聚类个数；d_ij是第i个样本到第j类聚类中心的距离。Where: _uij is the membership of the ith sample to the jth class; _xi is the sample point; m is the membership factor; n is the number of samples; t is the number of iterations; c represents the number of clusters; _dij is the distance from the ith sample to the center of the jth cluster.

步骤3.5、计算||U^(t)-U^(t-1)||，并验证是否满足迭代停止条件||U^(t)-U^(t-1)||<ε，若满足条件停止迭代，否则继续重复步骤3.3和步骤3.4直到达到条件为止，最终得到各个天气情况下的相似日数据集；Step 3.5, calculate ||U ^(t) -U ^(t-1) ||, and verify whether the iteration stop condition ||U ^(t) -U ^(t-1) || < ε is met. If the condition is met, stop the iteration. Otherwise, continue to repeat steps 3.3 and 3.4 until the condition is met. Finally, a similar daily data set under various weather conditions is obtained.

步骤4、采用min-max归一化方法对光伏相似日数据集进行归一化处理，以消除不同变量量纲差别对预测结果的影响；Step 4: Use the min-max normalization method to normalize the photovoltaic similar day data set to eliminate the impact of different variable dimension differences on the prediction results;

min-max归一化方法如式(7)所示：The min-max normalization method is shown in formula (7):

式中，x^*表示待标准化的数据，x表示标准化后的数据，x_max表示某一变量数据的最大值，x_min是某一变量数据的最小值。In the formula, x ^* represents the data to be standardized, x represents the data after standardization, _xmax represents the maximum value of a variable data, and _xmin represents the minimum value of a variable data.

步骤6、构建分位数回归(QR)-卷积神经网络(CNN)-双向长短期记忆(BiLSTM)神经网络即QR-CNN-BiLSTM区间预测模型，设置CNN和BiLSTM模型的参数，设置模型训练的相关参数；将相似日数据集的训练集输入构建的QR-CNN-BiLSTM模型中进行模型训练；具体为：Step 6: Construct a quantile regression (QR)-convolutional neural network (CNN)-bidirectional long short-term memory (BiLSTM) neural network, namely a QR-CNN-BiLSTM interval prediction model, set the parameters of the CNN and BiLSTM models, and set the relevant parameters for model training; input the training set of the similar day data set into the constructed QR-CNN-BiLSTM model for model training; specifically:

卷积神经网络提取局部特征后的神经元输出如式(8)所示；The neuron output after the convolutional neural network extracts local features is shown in formula (8);

式中：O表示神经元局部输出；I为神经元输入；l、m、n分别表示输出矩阵的3个维度；i、j、n分别表示卷积核K的长度、宽度和深度；b_n表示卷积核的阈值；表示矩阵的乘法运算。Where: O represents the local output of the neuron; I represents the input of the neuron; l, m, and n represent the three dimensions of the output matrix; i, j, and n represent the length, width, and depth of the convolution kernel K; b _n represents the threshold of the convolution kernel; Represents a matrix multiplication operation.

步骤6.2、将输出的特征向量转化为时间序列输入到BiLSTM网络中，以进一步捕捉时间序列中的长期依赖关系。Step 6.2: Convert the output feature vector into a time series and input it into the BiLSTM network to further capture the long-term dependencies in the time series.

BiLSTM网络的计算过程如下：遗忘门决定哪些输入信息将从记忆单元状态中删除，如式(9)所示；The calculation process of the BiLSTM network is as follows: the forget gate determines which input information will be deleted from the memory cell state, as shown in formula (9);

f_t＝σ(W_f·[h_t-1,X_t]+b_f) (9)f _t =σ(W _f ·[h _t-1 ,X _t ]+b _f ) (9)

将上一时刻的输出值和当前时刻的输入值输入到输入门，经计算后得到输入门的输出值，如式(10)所示；The output value of the previous moment and the input value of the current moment are input into the input gate, and the output value of the input gate is obtained after calculation, as shown in formula (10);

i_t＝σ(W_i·[h_t-1,X_t])+b_i (10)i _t =σ(W _i ·[h _t-1 ,X _t ])+b _i (10)

将上一时刻的输出值和当前时刻的输入值输入到输入门，经计算后得到候选细胞状态，如式(11)所示；The output value at the previous moment and the input value at the current moment are input into the input gate, and the candidate cell state is obtained after calculation, as shown in formula (11);

更新当前的细胞状态，如式(12)所示；Update the current cell state, as shown in formula (12);

将上一时刻的输出值和当前时刻的输入值输入到输出门，经计算后得到输出门的输出值，如式(13)所示；The output value of the previous moment and the input value of the current moment are input to the output gate, and the output value of the output gate is obtained after calculation, as shown in formula (13);

o_t＝σ(W_o·[h_t-1,X_t]+b_o) (13)o _t =σ(W _o ·[h _t-1 ,X _t ]+b _o ) (13)

将输出门的输出与细胞状态进行计算，得到输出值，如式(14)所示；The output of the output gate is calculated with the cell state to obtain the output value, as shown in formula (14);

h_t＝o_t*tanh(C_t) (14)h _t = o _t *tanh(C _t ) (14)

式中：f_t为t时刻遗忘门输出；σ和tanh函数均为激活函数；h_t-1为t-1时刻数据输出信息；X_t为t时刻数据输入信息；W_f、W_i、W_C、W_o为权重系数；b_f、b_i、b_C、b_o为偏置参数；i_t和表示t时刻输入的输出；C_t-1为t-1时刻的细胞状态；C_t为_t时刻的细胞状态；h_t为t时刻数据输出信息；o_t表示t时刻经激活函数Sigmoid激活后的输出。Where: f _t is the output of the forget gate at time t; σ and tanh functions are both activation functions; h _t-1 is the data output information at time t-1; X _t is the data input information at time t; W _f , _Wi , W _C , and W _o are weight coefficients; b _f , _bi , b _C , and b _o are bias parameters; i _t and represents the output of the input at time t; C _t-1 is the cell state at time t-1; C _t is the cell state at time _t ; h _t is the data output information at time t; o _t represents the output after activation by the activation function Sigmoid at time t.

将正向数据输入到正向LSTM层，得到正向LSTM层的输出。再将数据反向输入到反向LSTM层，得到反向的输出后将输出再次反向，最终得到反向LSTM层的输出。最后将正向LSTM层输出与反向LSTM层输出按照一定的权重进行线性叠加，得到的输出结果；Input the forward data into the forward LSTM layer to get the output of the forward LSTM layer. Then input the data into the reverse LSTM layer in reverse direction, get the reverse output, and then reverse the output again to get the output of the reverse LSTM layer. Finally, linearly superimpose the output of the forward LSTM layer and the output of the reverse LSTM layer according to certain weights to get the output result.

设表示CNN-BiLSTM点预测模型，其中X_t为模型输入即自变量，Ω为CNN-BiLSTM的模型参数，Y_t为因变量，为Y_t的预测值。set up represents the CNN-BiLSTM point prediction model, where _Xt is the model input, i.e., the independent variable, Ω is the model parameter of CNN-BiLSTM, and _Yt is the dependent variable. is the predicted value of _Yt .

步骤6.4、模型的参数设置与模型训练。Step 6.4: Model parameter setting and model training.

设置BiLSTM网络层数为3，神经元个数为128，dropout参数为0.2；设置分位点起步为0.05，步长为0.05，终点为1，因此全连接层神经元个数为19。Set the number of BiLSTM network layers to 3, the number of neurons to 128, and the dropout parameter to 0.2; set the starting point of the quantile to 0.05, the step size to 0.05, and the end point to 1, so the number of neurons in the fully connected layer is 19.

将相似日数据集的训练集输入构建的QR-CNN-BiLSTM模型中进行模型训练；Input the training set of similar day data set into the constructed QR-CNN-BiLSTM model for model training;

步骤7、将测试集的数据输入到步骤6中训练好的QR-CNN-BiLSTM模型中，进行光伏功率区间预测；Step 7: Input the test set data into the QR-CNN-BiLSTM model trained in step 6 to predict the photovoltaic power range;

步骤9、基于步骤8中得到的反归一化的光伏功率区间预测结果数据，采用交叉验证和网格搜索方法优化的核密度估计算法，生成测试集上的光伏概率预测结果；Step 9: Based on the denormalized photovoltaic power interval prediction result data obtained in step 8, a kernel density estimation algorithm optimized by cross-validation and grid search methods is used to generate photovoltaic probability prediction results on the test set;

具体为：对于特定的未来光伏功率预测点，应用步骤7中的QR-CNN-BiLSTM融合模型获得N个条件分位数下一组N个样本，即该含N个样本的向量为其概率密度函数可以通过核密度估计方法得到，此向量的KDE计算如式(15)所示：Specifically, for a specific future photovoltaic power prediction point, the QR-CNN-BiLSTM fusion model in step 7 is applied to obtain a set of N samples under N conditional quantiles, that is, the vector containing N samples is Its probability density function can be obtained by the kernel density estimation method. The KDE calculation of this vector is shown in formula (15):

式中：N是样本总数；B是采用交叉验证的网格搜索方法确定的最优带宽，且B>0；K是一个核函数。高斯核是本发明中使用的核函数K，高斯核函数的如式(16)所示：Where: N is the total number of samples; B is the optimal bandwidth determined by the cross-validation grid search method, and B>0; K is a kernel function. Gaussian kernel is the kernel function K used in the present invention, and the Gaussian kernel function is shown in formula (16):

本发明提出的交叉验证和网格搜索方法优化核密度估计，解决了核密度估计中带宽选择困难的问题，该方法可以生成高质量的概率预测结果。The cross-validation and grid search methods proposed in the present invention optimize kernel density estimation, solve the problem of difficult bandwidth selection in kernel density estimation, and can generate high-quality probability prediction results.

对光伏功率点预测、区间预测和概率预测结果进行评价。采用均方根误差e_RMSE和平均绝对百分误差e_MAPE用于点预测结果评价；采用区间综合评价指标I_WC进行区间预测结果评价；采用连续分级概率评分P_CRPS进行概率预测结果评价；如下式所示：The photovoltaic power point prediction, interval prediction and probability prediction results are evaluated. The root mean square error e _RMSE and mean absolute percentage error e _MAPE are used to evaluate the point prediction results; the interval comprehensive evaluation index I _WC is used to evaluate the interval prediction results; the continuous graded probability score P _CRPS is used to evaluate the probability prediction results; as shown in the following formula:

I_WC＝I_PINAW/I_PICP I _WC = I _PINAW /I _PICP

其中：in:

式中：P_ri表示功率观测值，P_pi表示功率预测值，N为预测的未来光伏功率的总点数。S_n是逻辑值，当观测值落在预测区间内时，S_n取1，否则S_n取0；E为观测值最大值和最小值的差；P_upi和P_downi分别为预测区间的上界和下界；p(x)表示概率密度函数；F(P_pi)表示P_pi的累积分布函数；H(P_pi-P_ri)是阶跃函数。Where: _Pri represents the power observation value, _Ppi represents the power prediction value, and N is the total number of predicted future photovoltaic power points. _Sn is a logical value. When the observed value falls within the prediction interval, _Sn takes 1, otherwise _Sn takes 0; E is the difference between the maximum and minimum values of the observed value; _Pupi and _Pdowni are the upper and lower bounds of the prediction interval respectively; p(x) represents the probability density function; F( _Ppi ) represents the cumulative distribution function of _Ppi ; H( _Ppi - _Pri ) is the step function.

本发明的基于深度学习融合模型的光伏功率区间概率预测方法，首先采用相关系数法对气象因素进行筛选，减少无关特征过多带来的模型预测误差；然后选取高相关气象因素变量作为聚类变量并构建其统计特征，采用FCM聚类算法进行聚类得到相似日数据集为进一步提高预测精度奠定基础；QR-CNN-BiLSTM模型融合了CNN及BiLSTM两种深度学习模型，较传统的单一深度学习预测模型具有更高的预测精度，能生成更高质量的区间预测结果；交叉验证和网格搜索方法优化核密度估计，解决了核密度估计中带宽选择困难的问题，该方法可以生成高质量的概率预测结果。The photovoltaic power interval probability prediction method based on the deep learning fusion model of the present invention first uses the correlation coefficient method to screen meteorological factors to reduce the model prediction error caused by too many irrelevant features; then selects highly correlated meteorological factor variables as clustering variables and constructs their statistical characteristics, and uses the FCM clustering algorithm to cluster to obtain similar day data sets to lay the foundation for further improving the prediction accuracy; the QR-CNN-BiLSTM model integrates two deep learning models, CNN and BiLSTM, and has higher prediction accuracy than the traditional single deep learning prediction model, and can generate higher quality interval prediction results; cross-validation and grid search methods optimize kernel density estimation, solve the problem of difficult bandwidth selection in kernel density estimation, and this method can generate high-quality probability prediction results.

实施例Example

采用澳大利亚沙漠知识太阳能中心(DKASC)Alice Springs站点某厂家生产的光伏阵列在2019年—2020年的光伏功率数据和前述筛选得到的4个气象因素数据为仿真数据。该数据集的时间分辨率为5min。由于光伏阵列夜晚不发电，去除夜晚功率为0的数据，保留每天6:00—19:30时间段的数据作为算例分析数据。The photovoltaic power data of the photovoltaic array produced by a manufacturer at the Alice Springs site of the Australian Desert Knowledge Solar Center (DKASC) from 2019 to 2020 and the four meteorological factor data obtained by the above screening are used as simulation data. The time resolution of this data set is 5 minutes. Since the photovoltaic array does not generate electricity at night, the data with zero power at night are removed, and the data from 6:00 to 19:30 every day are retained as the case analysis data.

采用模糊C均值聚类算法最终将数据集分为三类：晴天、晴转多云天和阴雨天相似日数据集。将各个相似日数据集划分训练集和测试集，训练集占比为0.7，测试集选取的是时间上临近的30d相似日数据，训练集选取的是相似日数据集中距离测试集最近的前70d的相似日数据。The fuzzy C-means clustering algorithm is used to finally divide the data set into three categories: sunny, sunny to cloudy, and rainy similar day data sets. Each similar day data set is divided into a training set and a test set, with the training set accounting for 0.7. The test set selects the 30-day similar day data that is close in time, and the training set selects the 70-day similar day data that is closest to the test set in the similar day data set.

为了说明提出的模型在光伏功率短期区间和概率预测中的优势，将提出的QR-CNN-BiLSTM模型与QR-LSTM、QR-BiLSTM模型的预测效果分别在3种天气类型下进行了比较，在每种天气类型中随机选取一天作为进行可视化分析，如图3-11所示，展示了晴天、晴转多云天和阴雨天天气类型下各个模型点预测和95％置信水平区间预测结果，图12-14展示了晴天、晴转多云天和阴雨天天气类型下QR-CNN-BiLSTM模型结合了核密度估计的概率预测结果(从164个点中等距选取了9个点的概率预测结果予以展示)，评价指标对比如表2、3、4所示。In order to illustrate the advantages of the proposed model in the short-term interval and probability prediction of photovoltaic power, the prediction effects of the proposed QR-CNN-BiLSTM model, QR-LSTM and QR-BiLSTM models were compared under three weather types. One day was randomly selected in each weather type for visualization analysis, as shown in Figure 3-11, which shows the prediction results of each model point and 95% confidence level interval under sunny, sunny to cloudy and rainy weather types. Figures 12-14 show the probability prediction results of the QR-CNN-BiLSTM model combined with kernel density estimation under sunny, sunny to cloudy and rainy weather types (the probability prediction results of 9 points were equally spaced from 164 points for display). The comparison of evaluation indicators is shown in Tables 2, 3 and 4.

表2模型的评价指标Table 2 Evaluation indicators of the model

表3模型的评价指标Table 3 Evaluation indicators of the model

表4模型的评价指标Table 4 Evaluation indicators of the model

从表2中可以看出晴天时，各个模型预测区间的覆盖率都接近于100％，QR-CNN-BiLSTM模型的预测区间宽度明显低于其他模型，PINAW指标较QR-LSTM降低了72.77％，较QR-BiLSTM降低了66.26％；同时QR-CNN-BiLSTM模型的区间综合评价指标值I_WC也最小，I_WC指标较QR-LSTM降低了72.62％，较QR-BiLSTM降低了66.08％，因此其区间预测性能最佳。QR-CNN-BiLSTM概率评价指标CRPS值最小，因此其概率预测性能也最好。从确定性评价指标看出，QR-CNN-BiLSTM模型点预测性能也更好。From Table 2, it can be seen that on sunny days, the coverage of the prediction intervals of each model is close to 100%. The prediction interval width of the QR-CNN-BiLSTM model is significantly lower than that of other models. The PINAW index is 72.77% lower than that of QR-LSTM and 66.26% lower than that of QR-BiLSTM. At the same time, the interval comprehensive evaluation index value I _WC of the QR-CNN-BiLSTM model is also the smallest. The I _WC index is 72.62% lower than that of QR-LSTM and 66.08% lower than that of QR-BiLSTM. Therefore, its interval prediction performance is the best. The probability evaluation index CRPS value of QR-CNN-BiLSTM is the smallest, so its probability prediction performance is also the best. From the deterministic evaluation index, it can be seen that the point prediction performance of the QR-CNN-BiLSTM model is also better.

从表3中可以看出晴转多云天时，在各个模型预测区间的PICP值都>95％的前提下，QR-CNN-BiLSTM模型的预测区间的PINAW显著降低，较QR-LSTM降低了28.34％，较QR-BiLSTM降低了25.62％；同时QR-CNN-BiLSTM模型的区间综合评价指标I_WC值也最小，I_WC指标较QR-LSTM降低了28.80％，较QR-BiLSTM降低了26.09％，因此其区间预测性能最佳。QR-CNN-BiLSTM概率评价指标CRPS值最小，因此其概率预测性能也最好。从确定性评价指标看出，QR-CNN-BiLSTM模型点预测性能也更好。From Table 3, it can be seen that when the weather changes from sunny to cloudy, under the premise that the PICP values of the prediction intervals of each model are all >95%, the PINAW of the prediction interval of the QR-CNN-BiLSTM model is significantly reduced, which is 28.34% lower than that of QR-LSTM and 25.62% lower than that of QR-BiLSTM; at the same time, the I _WC value of the interval comprehensive evaluation index of the QR-CNN-BiLSTM _model is also the smallest, which is 28.80% lower than that of QR-LSTM and 26.09% lower than that of QR-BiLSTM, so its interval prediction performance is the best. The CRPS value of the probability evaluation index of QR-CNN-BiLSTM is the smallest, so its probability prediction performance is also the best. From the deterministic evaluation index, it can be seen that the point prediction performance of the QR-CNN-BiLSTM model is also better.

从表4中可以看出阴雨天时，在各个模型预测区间的PICP值都>95％的前提下，QR-CNN-BiLSTM模型的预测区间的PINAW显著降低，较QR-LSTM降低了7.98％，较QR-BiLSTM降低了4.47％，同时区间综合评价指标I_WC值也最小，I_WC指标较QR-LSTM降低了6.81％，较QR-BiLSTM降低了4.08％，显著减少了光伏功率区间预测的不确定性，因此QR-CNN-BiLSTM区间预测性能最佳。QR-CNN-BiLSTM概率评价指标CRPS值最小，因此其概率预测性能也最出色。从确定性评价指标看出QR-CNN-BiLSTM模型点预测性能也更好。综上分析，QR-CNN-BiLSTM模型的点预测、区间预测和概率预测性能都更加优越。It can be seen from Table 4 that on rainy days, under the premise that the PICP values of the prediction intervals of each model are all >95%, the PINAW of the prediction interval of the QR-CNN-BiLSTM model is significantly reduced, which is 7.98% lower than that of QR-LSTM and 4.47% lower than that of QR-BiLSTM. At the same time, the I _WC value of the interval comprehensive evaluation index is also the smallest. The I _WC index is 6.81% lower than that of QR-LSTM and 4.08% lower than that of QR-BiLSTM, which significantly reduces the uncertainty of the photovoltaic power interval prediction. Therefore, the QR-CNN-BiLSTM interval prediction performance is the best. The CRPS value of the QR-CNN-BiLSTM probability evaluation index is the smallest, so its probability prediction performance is also the best. From the deterministic evaluation index, it can be seen that the QR-CNN-BiLSTM model has better point prediction performance. In summary, the point prediction, interval prediction and probability prediction performance of the QR-CNN-BiLSTM model are all superior.

Claims

1. A photovoltaic power interval probability prediction method based on a deep learning fusion model is characterized by being implemented in the following steps:

Step 1: Obtain pre-processed meteorological element data and historical photovoltaic power data, perform variable correlation analysis on photovoltaic power and meteorological factors, and determine the input variables of the prediction model;

Step 2: Select clustering variables and construct statistical characteristics of clustering variables.

Step 3: Based on the clustering variables and their statistical characteristics selected in step 2, the fuzzy C-means clustering algorithm is used to cluster similar days of photovoltaic historical data to obtain a photovoltaic similar day data set;

Step 4: Use the min-max normalization method to normalize the photovoltaic similar day data set;

Step 5: Divide the similar day data sets under various weather conditions into training sets and test sets;

Step 6: Build the QR-CNN-BiLSTM interval prediction model, set the parameters of the CNN and BiLSTM models, and set the relevant parameters for model training; input the training set of the similar day data set into the model for training; specifically:

Step 6.1: construct a feature map of the normalized similar day data set in the form of a sliding window and input it into the CNN network, and use its convolutional layer and pooling layer to extract the feature vector that characterizes the dynamic change of photovoltaic power;

Step 6.2: Convert the output feature vector into a time series and input it into the BiLSTM network;

Step 6.3: Introduce the QR model and combine it with CNN-BiLSTM. The QR model and CNN-BiLSTM model are fused in the form of loss function.

set up represents the CNN-BiLSTM point prediction model, where _Xt is the model input, i.e., the independent variable, Ω is the model parameter of CNN-BiLSTM, and _Yt is the dependent variable. is the predicted value of _Yt ;

Then the QR-CNN-BiLSTM model can be expressed as Estimator of model parameter Ω(τ) at each quantile By minimizing the loss function acquired;

Step 6.4: Model parameter setting and model training;

The network structure of the QR-CNN-BiLSTM model consists of a layer of CNN, a layer of maximum pooling, three layers of BiLSTM, and a layer of fully connected layer.

Set the number of convolution layers to 1, the number of convolution kernels to 64, the convolution kernel size to 4, the convolution boundary processing method to 'same', the activation function to , and the pooling window size to 3;

Set the number of BiLSTM network layers to 3, the number of neurons to 128, and the dropout parameter to 0.2; set the starting point of the quantile to 0.05, the step size to 0.05, and the end point to 1, so the number of neurons in the fully connected layer is 19;

Set the initial learning rate to 0.01, the learning rate decay to 1.5, the minimum learning rate to 10 ^-4 , the maximum number of iterations to 100, the batch parameter to 32, and the optimizer to Adam;

Input the training set of similar day data set into the constructed QR-CNN-BiLSTM model for model training;

Step 7: Input the test set data into the QR-CNN-BiLSTM interval prediction model trained in step 6 to perform photovoltaic power interval prediction;

Step 8: Denormalize the interval prediction results to make them physically meaningful;

Step 9: Use the kernel density estimation algorithm optimized by cross-validation and grid search methods to generate photovoltaic probability prediction results on the test set.

2. The photovoltaic power interval probability prediction method based on deep learning fusion model according to claim 1 is characterized in that in step 1, specifically:

Step 1.1, select the pre-processed meteorological element data and photovoltaic power generation data;

The time resolution of meteorological element variables and photovoltaic power variables is 5 min, and wind speed, relative humidity, ambient temperature, total horizontal radiation, diffuse horizontal radiation, rainfall, wind direction, total oblique radiation and diffuse oblique radiation are selected as the original meteorological element data variables;

Step 1.2, the Kendall rank correlation coefficient R is used to measure the degree of correlation between multiple meteorological element variables;

Step 1.3: Select meteorological factors whose absolute value of the Kendall rank correlation coefficient R of photovoltaic power is not less than 0.5 and use them as input to the prediction model.

3. The photovoltaic power interval probability prediction method based on deep learning fusion model according to claim 2 is characterized in that in step 2, specifically:

Step 2.1, select the meteorological factor variable with the highest kendall rank correlation coefficient value of photovoltaic power as the clustering variable;

Step 2.2: Select the mean, standard deviation, maximum value, number of peaks and troughs, coefficient of variation, kurtosis and skewness of the clustering variable as statistical features.

4. The photovoltaic power interval probability prediction method based on deep learning fusion model according to claim 1 is characterized in that in step 3, specifically:

Step 3.1, based on the statistical characteristics of the clustering variables constructed in step 2, calculate the values of the seven statistical characteristics of the clustering variables on each day;

Step 3.2, determine the number of data clustering categories c, initialize the cluster center V _i , give the fuzzification parameter m, initialize the membership matrix U ⁽⁰⁾ , and give the termination criterion ε of the algorithm;

Step 3.3: Calculate all cluster centers of the tth iteration according to formula (5) to obtain the cluster center matrix:

Where: _uij is the membership of the i-th sample to the j-th class; _xi is the sample point; m is the membership factor; n is the number of samples; t is the number of iterations; c represents the number of clusters;

Step 3.4: Update the membership matrix U ^(t) . The calculation method is shown in formula (6):

Where: u _ij is the membership of the ith sample to the jth class; _xi is the sample point; m is the membership factor; n is the number of samples; t is the number of iterations; c is the number of clusters; d _ij is the distance from the ith sample to the center of the jth cluster;

Step 3.5, calculate ||U ^(t) -U ^(t-1) ||, and verify whether the iteration stop condition ||U ^(t) -U ^(t-1) || < ε is met. If the condition is met, stop the iteration; otherwise, continue to repeat steps 3.3 and 3.4 until the condition is met, and finally obtain a similar daily data set under various weather conditions.