[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2023087569A1 - Photovoltaic string communication abnormality identification method and system based on xgboost - Google Patents

Photovoltaic string communication abnormality identification method and system based on xgboost Download PDF

Info

Publication number
WO2023087569A1
WO2023087569A1 PCT/CN2022/078431 CN2022078431W WO2023087569A1 WO 2023087569 A1 WO2023087569 A1 WO 2023087569A1 CN 2022078431 W CN2022078431 W CN 2022078431W WO 2023087569 A1 WO2023087569 A1 WO 2023087569A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
photovoltaic
current value
sample set
period
Prior art date
Application number
PCT/CN2022/078431
Other languages
French (fr)
Chinese (zh)
Inventor
王振荣
曾谁飞
祝金涛
王青天
赵鹏程
王�华
任鑫
赵斌
李靖
Original Assignee
中国华能集团清洁能源技术研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国华能集团清洁能源技术研究院有限公司 filed Critical 中国华能集团清洁能源技术研究院有限公司
Publication of WO2023087569A1 publication Critical patent/WO2023087569A1/en

Links

Images

Definitions

  • the present disclosure relates to the field of abnormal identification of photovoltaic string communication, and in particular to a method and system for identifying abnormal communication of photovoltaic strings based on XGBoost.
  • the present disclosure provides an XGBoost-based photovoltaic string communication abnormality identification method and system to at least solve the technical problem of inaccurate identification of photovoltaic string communication abnormality in a photovoltaic power station in the related art.
  • the embodiment of the first aspect of the present disclosure proposes an XGBoost-based photovoltaic string communication abnormality identification method, the method including:
  • Input the normalized value of the current of each photovoltaic string in the photovoltaic power station during the period to be tested into the trained XGBoost model, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested.
  • the embodiment of the second aspect of the present disclosure proposes an XGBoost-based photovoltaic string communication abnormality identification system, the system includes:
  • the obtaining module is used to obtain the current value of each photovoltaic string in the photovoltaic power station during the period to be tested and the current value of all photovoltaic strings in the photovoltaic power station during the preset period before the period to be tested and the corresponding abnormal communication tag data, and send the Current value normalization;
  • the first sample module is used to record the normalized current values of all photovoltaic strings of the photovoltaic power plant and the corresponding abnormal communication tag data within the preset period before the date to be tested as a sample set A, and record the faulty photovoltaics in A
  • the sample set composed of the current value of the string and the corresponding abnormal communication label is denoted as B;
  • the second sample module is used to input the current value of the faulty photovoltaic string in the sample set B into the pre-trained VaDE model, obtain the sample set of abnormal communication of the current value of the faulty photovoltaic string generated by the VaDE model, and send the
  • the above sample set is denoted as C, and the combined sample set A and sample set C are combined to obtain the combined sample set D;
  • the training module is used to use the sample set D to train the XGBoost model to obtain the trained XGBoost model;
  • the identification module is used to input the normalized value of the current of each photovoltaic string in the photovoltaic power station into the trained XGBoost model during the period to be tested, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested .
  • the embodiment of the third aspect of the present disclosure proposes an electronic device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the program, the implementation of the first aspect above is realized.
  • the embodiment of the fourth aspect of the present disclosure proposes a computer-readable storage medium, on which a computer program is stored.
  • the program is executed by a processor, the XGBoost-based photovoltaic string communication abnormality identification method described in the embodiment of the first aspect above is implemented. .
  • the embodiment of the fifth aspect of the present disclosure proposes a computer program product, the computer program product includes computer program code, when the computer program code is run on the computer, to execute the XGBoost-based An abnormal identification method for photovoltaic string communication.
  • the embodiment of the sixth aspect of the present disclosure proposes a computer program, the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes the XGBoost-based A photovoltaic string communication abnormal identification method.
  • the present disclosure provides an XGBoost-based photovoltaic string communication abnormality identification method and system, the method comprising: obtaining the current value of each photovoltaic string in the photovoltaic power station within the period to be tested and the photovoltaic power station within the preset period before the period to be tested The current values of all photovoltaic strings and the corresponding abnormal communication tag data, and normalize the current values; the normalized current values and corresponding The data of the abnormal communication tag in the sample set is recorded as sample set A, and the sample set composed of the current value of the faulty photovoltaic string in A and the corresponding abnormal communication tag is recorded as B; the current value of the faulty photovoltaic string in the sample set B is Input the pre-trained VaDE model to obtain the sample set of faulty photovoltaic string current value communication abnormality generated by the VaDE model, and record the sample set as C, and combine sample set A and sample set C to obtain the combined sample set D.
  • the technical solution provided by this disclosure trains the XGBoost model based on the generated data of the VaDE model to obtain a trained XGBoost model, and uses the trained model to identify whether the current communication of each string in the photovoltaic power station is abnormal, which can improve Accuracy of XGBoost model in identifying abnormal PV strings in PV power plants.
  • Fig. 1 is a flow chart of an XGBoost-based photovoltaic string communication abnormality identification method provided according to an embodiment of the present disclosure
  • Fig. 2 is a structural diagram of an XGBoost-based photovoltaic string communication abnormality identification system according to the present disclosure.
  • An XGBoost-based photovoltaic string communication abnormality identification method and system proposed in the present disclosure includes: obtaining the current value of each photovoltaic string in the photovoltaic power station within the period to be tested and the photovoltaic power station within the preset period before the period to be tested The current values of all photovoltaic strings and the corresponding abnormal communication tag data, and normalize the current values; the normalized current values and corresponding The data of the abnormal communication tag in the sample set is recorded as sample set A, and the sample set composed of the current value of the faulty photovoltaic string in A and the corresponding abnormal communication tag is recorded as B; the current value of the faulty photovoltaic string in the sample set B is Input the pre-trained VaDE model to obtain the sample set of faulty photovoltaic string current value communication abnormality generated by the VaDE model, and record the sample set as C, and combine sample set A and sample set C to obtain the combined sample set D.
  • the technical solution provided by this disclosure trains the XGBoost model based on the generated data of the VaDE model to obtain a trained XGBoost model, and uses the trained model to identify whether the current communication of each string in the photovoltaic power station is abnormal, which can improve Accuracy of XGBoost model in identifying abnormal PV strings in PV power plants.
  • Fig. 1 is a flow chart of an XGBoost-based photovoltaic string communication abnormality identification method provided in this embodiment. As shown in Fig. 1, the method includes:
  • Step 1 Obtain the current value of each photovoltaic string in the photovoltaic power station during the period to be tested and the current value of all photovoltaic strings in the photovoltaic power station in the preset period before the period to be tested and the corresponding communication abnormal tag data, and store the current value Normalized;
  • the current value of each photovoltaic string in the photovoltaic power plant on the day of the detection date and the current value of all photovoltaic strings in the photovoltaic power plant within 30 days before the detection date and the corresponding abnormal communication tag data are obtained, wherein, within the 30 days Contains PV strings with abnormal current value communication.
  • the calculation formula for normalizing the current value of the photovoltaic string m at the jth moment in the period to be tested is as follows:
  • x m,j is the current value of the photovoltaic string m at the jth moment in the period to be tested
  • x m,k is the current value of the photovoltaic string m to be tested
  • h is the set of all moments in the measurement period of the acquired data
  • k is any moment in h.
  • Step 2 Record the normalized current values of all photovoltaic strings and the corresponding communication abnormal tag data in the preset period before the date to be tested as sample set A, and record the current value of the faulty photovoltaic strings in A And the sample set composed of corresponding abnormal communication labels is denoted as B;
  • Step 3 Input the current value of the faulty photovoltaic string in the sample set B into the pre-trained VaDE model, obtain the sample set of abnormal communication of the current value of the faulty photovoltaic string generated by the VaDE model, and record the sample set For C, combine sample set A and sample set C to obtain the combined sample set D;
  • the pre-trained VaDE model includes: a first neural network layer, a sampling layer and a second neural network layer;
  • the training procedure of described VaDE model comprises:
  • the data input that obtains is in initial VaDE model first neural network layer, sampling layer and the second neural network layer, with marginal likelihood lower bound as the loss function of model, utilize adaptive matrix estimation Adam optimization algorithm to carry out described model Training to get the trained VaDE model.
  • Step 4 Use the sample set D to train the XGBoost model to obtain the trained XGBoost model
  • Step 5 Input the normalized value of the current of each photovoltaic string in the photovoltaic power station during the period to be tested into the trained XGBoost model, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested.
  • the XGBoost model is trained using the sample set D to obtain a trained XGBoost model, including:
  • D ⁇ (xi , y i ) ⁇ (
  • n, xi ⁇ R m , y i ⁇ R), and xi is the index value of the i-th sample at each time of the day , y i is the tag value of whether the indicator has abnormal communication in the i-th sample in one day, y i ⁇ ⁇ 0, 1 ⁇ , 0 indicates normal communication, 1 indicates abnormal communication, n is the number of samples, and m is the number of sample features;
  • the greedy algorithm is used to first find the segmentation point with the largest profit based on each sample feature, and then find the feature with the largest profit based on all sample features.
  • the double-layer cycle of features and segmentation points Pick out the split point with the largest gain, and split and increase the CART regression tree based on the split point.
  • a gradient boosting tree accumulate trees based on the sample set D, build a tree each time, and use the CART regression tree as the subtree model of the model.
  • the loss function of the regularization item at the tth iteration l is the real value y i and predicted value specified inside the algorithm
  • the loss function of is a differentiable convex function
  • f t ( xi ) is the weight of the i-th sample at the leaf node in the t-th iteration
  • ⁇ (f t ) is a regular term
  • the leaf node of the t-th iteration tree The number of nodes and the weight of leaf nodes are represented, and then, for L (t) in The second-order Taylor expansion is performed at the place, and the weight is determined by minimizing the loss function.
  • all the CART regression trees are merged together to obtain the trained XGBoost model.
  • index value of the photovoltaic string obtained here is the current value, and other index values of the photovoltaic string can also be obtained, which is not limited here.
  • the embodiment of the present disclosure provides an XGBoost-based photovoltaic string communication anomaly identification method, which trains the XGBoost model based on the generated data of the VaDE model to obtain a trained XGBoost model, and uses the trained model to Identifying whether the current communication of each string in the photovoltaic power station is abnormal can improve the accuracy of the XGBoost model in identifying abnormal photovoltaic strings in the photovoltaic power station.
  • Fig. 2 is a structural diagram of an XGBoost-based photovoltaic string communication abnormality identification system provided by an embodiment of the present disclosure. As shown in Fig. 2 , the system includes:
  • the obtaining module is used to obtain the current value of each photovoltaic string in the photovoltaic power station during the period to be tested and the current value of all photovoltaic strings in the photovoltaic power station during the preset period before the period to be tested and the corresponding abnormal communication tag data, and send the Current value normalization;
  • the first sample module is used to record the normalized current values of all photovoltaic strings of the photovoltaic power plant and the corresponding abnormal communication tag data within the preset period before the date to be tested as a sample set A, and record the faulty photovoltaics in A
  • the sample set composed of the current value of the string and the corresponding abnormal communication label is denoted as B;
  • the second sample module is used to input the current value of the faulty photovoltaic string in the sample set B into the pre-trained VaDE model, obtain the sample set of abnormal communication of the current value of the faulty photovoltaic string generated by the VaDE model, and send the
  • the above sample set is denoted as C, and the combined sample set A and sample set C are combined to obtain the combined sample set D;
  • the training module is used to use the sample set D to train the XGBoost model to obtain the trained XGBoost model;
  • the identification module is used to input the normalized value of the current of each photovoltaic string in the photovoltaic power station into the trained XGBoost model during the period to be tested, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested .
  • the normalized calculation formula of the current value of the photovoltaic string m at the jth moment in the period to be tested is as follows:
  • x m,j is the current value of the photovoltaic string m at the jth moment in the period to be tested
  • x m,k is the current value of the photovoltaic string m to be tested
  • h is the set of all moments in the measurement period of the acquired data
  • k is any moment in h.
  • the pre-trained VaDE model includes: a first neural network layer, a sampling layer and a second neural network layer;
  • the training procedure of described VaDE model comprises:
  • the data input that obtains is in initial VaDE model first neural network layer, sampling layer and the second neural network layer, with marginal likelihood lower bound as the loss function of model, utilize adaptive matrix estimation Adam optimization algorithm to carry out described model Training to get the trained VaDE model.
  • the training module includes:
  • n, xi ⁇ R m , y i ⁇ R), and xi is the i-th sample day
  • D ⁇ ( xi , y i ) ⁇ (
  • n, xi ⁇ R m , y i ⁇ R), and xi is the i-th sample day
  • the index value at each moment y i is the label value of whether the index has abnormal communication in the i-th sample in one day, y i ⁇ ⁇ 0, 1 ⁇ , 0 means normal communication, 1 means abnormal communication, n is the number of samples, m is number of sample features;
  • the training unit is used to construct a gradient boosting tree based on the sample set D, using the CART regression tree as the subtree model of the model, increasing the CART regression tree through iteration, and merging all the CART regression trees together to obtain the trained XGBoost model .
  • the greedy algorithm is used to first find the segmentation point with the largest profit based on each sample feature, and then find the feature with the largest profit based on all sample features.
  • the double-layer cycle of features and segmentation points Pick out the split point with the largest gain, and split and increase the CART regression tree based on the split point.
  • the XGBoost-based photovoltaic string communication anomaly recognition system trains the XGBoost model based on the generated data of the VaDE model to obtain a trained XGBoost model, and uses the trained model to identify photovoltaic strings. Identifying whether the current communication of each string in the power station is abnormal can improve the accuracy of the XGBoost model in identifying abnormal photovoltaic strings in the photovoltaic power station.
  • An embodiment of the present disclosure also proposes an electronic device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the program, the method based on XGBoost's photovoltaic string communication anomaly identification method.
  • the embodiment of the present disclosure also proposes a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the XGBoost-based photovoltaic string communication abnormality identification method described in Embodiment 1 is implemented.
  • the embodiment of the present disclosure also proposes a computer program product, the computer program product includes computer program code, when the computer program code is run on the computer, to execute the XGBoost-based photovoltaic string communication described in Embodiment 1 Exception identification method.
  • the embodiment of the present disclosure also proposes a computer program, the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes the XGBoost-based photovoltaic string communication exception described in Embodiment 1 recognition methods.

Landscapes

  • Photovoltaic Devices (AREA)

Abstract

A photovoltaic string communication abnormality identification method and system based on XGBoost. The method comprises: acquiring current values of all photovoltaic strings in a photovoltaic power station within a time period to be subjected to detection, current values of all the photovoltaic strings in the photovoltaic power station within a preset time period prior to said time period, and corresponding communication abnormality label data, and normalizing the current values; training an XGBoost model on the basis of generated data of a VaDE model, so as to obtain a trained XGBoost model; and inputting, into the trained XGBoost model, normalized current values of all the photovoltaic strings in the photovoltaic power station within said time period, and identifying a photovoltaic string, which has a current value communication abnormality, in the photovoltaic power station within said time period.

Description

一种基于XGBoost的光伏组串通信异常识别方法及系统An XGBoost-based photovoltaic string communication abnormality identification method and system
相关申请的交叉引用Cross References to Related Applications
本申请要求在2021年11月17日在中国提交的中国专利申请号No.202111362748.2的优先权,其全部内容通过引用并入本文。This application claims priority to Chinese Patent Application No. 202111362748.2 filed in China on November 17, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本公开涉及光伏组串通信异常识别领域,具体涉及一种基于XGBoost的光伏组串通信异常识别方法及系统。The present disclosure relates to the field of abnormal identification of photovoltaic string communication, and in particular to a method and system for identifying abnormal communication of photovoltaic strings based on XGBoost.
背景技术Background technique
随着科技日新月异的发展,光伏发电技术在国内外均得到了广泛的应用,其应用形式多种多样,应用场所分布广泛,主要用于大型地面光伏电站、住宅和商用建筑物的屋顶、建筑光伏建筑一体化、光伏路灯等。在实际应用中,由于光伏电站中光伏组串众多,对于通信异常的光伏组串没有办法及时的发现和处理,由于该通信异常导致数据异常时有时无,时大时小,造成基于该数据的组串异常诊断模型发生误检漏检,降低了相关性分析的可信度,因此,亟需提出一种可以准确识别光伏电站中光伏组串通信异常的方法及系统。With the rapid development of science and technology, photovoltaic power generation technology has been widely used at home and abroad. Its application forms are various and its application places are widely distributed. Building integration, photovoltaic street lights, etc. In practical applications, due to the large number of photovoltaic strings in the photovoltaic power station, there is no way to find and deal with the abnormal communication of photovoltaic strings in a timely manner. False detections and missed detections occur in the string abnormality diagnosis model, which reduces the reliability of correlation analysis. Therefore, it is urgent to propose a method and system that can accurately identify communication abnormalities in photovoltaic strings in photovoltaic power plants.
发明内容Contents of the invention
本公开提供一种基于XGBoost的光伏组串通信异常识别方法及系统,以至少解决相关技术中对光伏电站中光光伏组串通信是否异常识别不准确的技术问题。The present disclosure provides an XGBoost-based photovoltaic string communication abnormality identification method and system to at least solve the technical problem of inaccurate identification of photovoltaic string communication abnormality in a photovoltaic power station in the related art.
本公开第一方面实施例提出一种基于XGBoost的光伏组串通信异常识别方法,所述方法包括:The embodiment of the first aspect of the present disclosure proposes an XGBoost-based photovoltaic string communication abnormality identification method, the method including:
获取待测时段内光伏电站中各光伏组串的电流值及待测时段前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据,并将所述电流值归一化;Obtain the current value of each photovoltaic string in the photovoltaic power station during the period to be tested and the current value of all photovoltaic strings in the photovoltaic power station in the preset period before the period to be tested and the corresponding communication abnormal tag data, and normalize the current value ;
将归一化后的待测日期前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据记作样本集A,并将A中故障光伏组串的电流值及对应的通信异常标签构成的样本集记作B;Record the normalized current values of all photovoltaic strings and corresponding abnormal communication label data in the photovoltaic power station within the preset period before the date to be tested as sample set A, and record the current values of the faulty photovoltaic strings in A and the corresponding The sample set composed of abnormal communication labels is denoted as B;
将所述样本集B中故障光伏组串的电流值输入预先训练好的VaDE模型中,得到VaDE模型生成的故障光伏组串电流值通信异常的样本集,并将所述样本集记作C,合并样本集A和样本集C得到合并后的样本集D;Input the current value of the faulty photovoltaic string in the sample set B into the pre-trained VaDE model to obtain a sample set of abnormal communication of the current value of the faulty photovoltaic string generated by the VaDE model, and denote the sample set as C, Merge sample set A and sample set C to obtain a combined sample set D;
利用样本集D对XGBoost模型进行训练,得到训练好的XGBoost模型;Use the sample set D to train the XGBoost model to obtain the trained XGBoost model;
将待测时段内光伏电站中各光伏组串的电流归一化后的值输入训练好的XGBoost模型中,识别出待测时段内光伏电站中电流值通信异常的光伏组串。Input the normalized value of the current of each photovoltaic string in the photovoltaic power station during the period to be tested into the trained XGBoost model, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested.
本公开第二方面实施例提出一种基于XGBoost的光伏组串通信异常识别系统,所述系统包括:The embodiment of the second aspect of the present disclosure proposes an XGBoost-based photovoltaic string communication abnormality identification system, the system includes:
获取模块,用于获取待测时段内光伏电站中各光伏组串的电流值及待测时段前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据,并将所述电流值归一化;The obtaining module is used to obtain the current value of each photovoltaic string in the photovoltaic power station during the period to be tested and the current value of all photovoltaic strings in the photovoltaic power station during the preset period before the period to be tested and the corresponding abnormal communication tag data, and send the Current value normalization;
第一样本模块,用于将归一化后的待测日期前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据记作样本集A,并将A中故障光伏组串的电流值及对应的通信异常标签构成的样本集记作B;The first sample module is used to record the normalized current values of all photovoltaic strings of the photovoltaic power plant and the corresponding abnormal communication tag data within the preset period before the date to be tested as a sample set A, and record the faulty photovoltaics in A The sample set composed of the current value of the string and the corresponding abnormal communication label is denoted as B;
第二样本模块,用于将所述样本集B中故障光伏组串的电流值输入预先训练好的VaDE模型中,得到VaDE模型生成的故障光伏组串电流值通信异常的样本集,并将所述样本集记作C,合并样本集A和样本集C得到合并后的样本集D;The second sample module is used to input the current value of the faulty photovoltaic string in the sample set B into the pre-trained VaDE model, obtain the sample set of abnormal communication of the current value of the faulty photovoltaic string generated by the VaDE model, and send the The above sample set is denoted as C, and the combined sample set A and sample set C are combined to obtain the combined sample set D;
训练模块,用于利用样本集D对XGBoost模型进行训练,得到训练好的XGBoost模型;The training module is used to use the sample set D to train the XGBoost model to obtain the trained XGBoost model;
识别模块,用于将待测时段内光伏电站中各光伏组串的电流归一化后的值输入训练好的XGBoost模型中,识别出待测时段内光伏电站中电流值通信异常的光伏组串。The identification module is used to input the normalized value of the current of each photovoltaic string in the photovoltaic power station into the trained XGBoost model during the period to be tested, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested .
本公开第三方面实施例提出一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以上第一方面实施例所述的基于XGBoost的光伏组串通信异常识别方法。The embodiment of the third aspect of the present disclosure proposes an electronic device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the program, the implementation of the first aspect above is realized. The XGBoost-based photovoltaic string communication abnormality identification method described in the example.
本公开第四方面实施例提出一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现以上第一方面实施例所述的基于XGBoost的光伏组串通信异常识别方法。The embodiment of the fourth aspect of the present disclosure proposes a computer-readable storage medium, on which a computer program is stored. When the program is executed by a processor, the XGBoost-based photovoltaic string communication abnormality identification method described in the embodiment of the first aspect above is implemented. .
本公开第五方面实施例提出一种计算机程序产品,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以执行以上第一方面实施例所述的基于XGBoost的光伏组串通信异常识别方法。The embodiment of the fifth aspect of the present disclosure proposes a computer program product, the computer program product includes computer program code, when the computer program code is run on the computer, to execute the XGBoost-based An abnormal identification method for photovoltaic string communication.
本公开第六方面实施例提出一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行以上第一方面实施例所述的基于XGBoost的光伏组串通信异常识别方法。The embodiment of the sixth aspect of the present disclosure proposes a computer program, the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes the XGBoost-based A photovoltaic string communication abnormal identification method.
本公开的实施例提供的技术方案至少带来以下有益效果:The technical solutions provided by the embodiments of the present disclosure bring at least the following beneficial effects:
本公开提供的一种基于XGBoost的光伏组串通信异常识别方法及系统,所述方法包括:获取待测时段内光伏电站中各光伏组串的电流值及待测时段前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据,并将所述电流值归一化;将归一化后的 待测日期前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据记作样本集A,并将A中故障光伏组串的电流值及对应的通信异常标签构成的样本集记作B;将所述样本集B中故障光伏组串的电流值输入预先训练好的VaDE模型中,得到VaDE模型生成的故障光伏组串电流值通信异常的样本集,并将所述样本集记作C,合并样本集A和样本集C得到合并后的样本集D;利用样本集D对XGBoost模型进行训练,得到训练好的XGBoost模型;将待测时段内光伏电站中各光伏组串的电流归一化后的值输入训练好的XGBoost模型中,识别出待测时段内光伏电站中电流值通信异常的光伏组串。本公开提供的技术方案,基于VaDE模型的生成数据对XGBoost模型进行训练,得到训练好的XGBoost模型,利用所述训练好的模型对光伏电站中各组串的电流通信是否异常进行识别,可以提升XGBoost模型识别光伏电站中异常光伏组串的准确性。The present disclosure provides an XGBoost-based photovoltaic string communication abnormality identification method and system, the method comprising: obtaining the current value of each photovoltaic string in the photovoltaic power station within the period to be tested and the photovoltaic power station within the preset period before the period to be tested The current values of all photovoltaic strings and the corresponding abnormal communication tag data, and normalize the current values; the normalized current values and corresponding The data of the abnormal communication tag in the sample set is recorded as sample set A, and the sample set composed of the current value of the faulty photovoltaic string in A and the corresponding abnormal communication tag is recorded as B; the current value of the faulty photovoltaic string in the sample set B is Input the pre-trained VaDE model to obtain the sample set of faulty photovoltaic string current value communication abnormality generated by the VaDE model, and record the sample set as C, and combine sample set A and sample set C to obtain the combined sample set D. Use the sample set D to train the XGBoost model to obtain the trained XGBoost model; input the normalized value of the current of each photovoltaic string in the photovoltaic power plant into the trained XGBoost model during the period to be tested, and identify the PV strings with abnormal current value communication in the PV power plant during the measurement period. The technical solution provided by this disclosure trains the XGBoost model based on the generated data of the VaDE model to obtain a trained XGBoost model, and uses the trained model to identify whether the current communication of each string in the photovoltaic power station is abnormal, which can improve Accuracy of XGBoost model in identifying abnormal PV strings in PV power plants.
本公开附加的方面以及优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
附图说明Description of drawings
本公开上述的和/或附加的方面以及优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and easily understood from the following description of the embodiments in conjunction with the accompanying drawings, wherein:
图1是根据本公开一个实施例提供的一种基于XGBoost的光伏组串通信异常识别方法的流程图;Fig. 1 is a flow chart of an XGBoost-based photovoltaic string communication abnormality identification method provided according to an embodiment of the present disclosure;
图2是根据本公开一种基于XGBoost的光伏组串通信异常识别系统的结构图。Fig. 2 is a structural diagram of an XGBoost-based photovoltaic string communication abnormality identification system according to the present disclosure.
具体实施方式Detailed ways
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the drawings, in which the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present disclosure and should not be construed as limiting the present disclosure.
本公开提出的一种基于XGBoost的光伏组串通信异常识别方法及系统,所述方法包括:获取待测时段内光伏电站中各光伏组串的电流值及待测时段前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据,并将所述电流值归一化;将归一化后的待测日期前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据记作样本集A,并将A中故障光伏组串的电流值及对应的通信异常标签构成的样本集记作B;将所述样本集B中故障光伏组串的电流值输入预先训练好的VaDE模型中,得到VaDE模型生成的故障光伏组串电流值通信异常的样本集,并将所述样本集记作C,合并样本集A和样本集C得到合并后的样本集D;利用样本集D对XGBoost模型进行训练,得到训练好的 XGBoost模型;将待测时段内光伏电站中各光伏组串的电流归一化后的值输入训练好的XGBoost模型中,识别出待测时段内光伏电站中电流值通信异常的光伏组串。本公开提供的技术方案,基于VaDE模型的生成数据对XGBoost模型进行训练,得到训练好的XGBoost模型,利用所述训练好的模型对光伏电站中各组串的电流通信是否异常进行识别,可以提升XGBoost模型识别光伏电站中异常光伏组串的准确性。An XGBoost-based photovoltaic string communication abnormality identification method and system proposed in the present disclosure, the method includes: obtaining the current value of each photovoltaic string in the photovoltaic power station within the period to be tested and the photovoltaic power station within the preset period before the period to be tested The current values of all photovoltaic strings and the corresponding abnormal communication tag data, and normalize the current values; the normalized current values and corresponding The data of the abnormal communication tag in the sample set is recorded as sample set A, and the sample set composed of the current value of the faulty photovoltaic string in A and the corresponding abnormal communication tag is recorded as B; the current value of the faulty photovoltaic string in the sample set B is Input the pre-trained VaDE model to obtain the sample set of faulty photovoltaic string current value communication abnormality generated by the VaDE model, and record the sample set as C, and combine sample set A and sample set C to obtain the combined sample set D. Use the sample set D to train the XGBoost model to obtain the trained XGBoost model; input the normalized value of the current of each photovoltaic string in the photovoltaic power plant into the trained XGBoost model during the period to be tested, and identify the PV strings with abnormal current value communication in the PV power plant during the measurement period. The technical solution provided by this disclosure trains the XGBoost model based on the generated data of the VaDE model to obtain a trained XGBoost model, and uses the trained model to identify whether the current communication of each string in the photovoltaic power station is abnormal, which can improve Accuracy of XGBoost model in identifying abnormal PV strings in PV power plants.
实施例1Example 1
图1为本实施例提供的一种基于XGBoost的光伏组串通信异常识别方法的流程图,如图1所示,所述方法包括:Fig. 1 is a flow chart of an XGBoost-based photovoltaic string communication abnormality identification method provided in this embodiment. As shown in Fig. 1, the method includes:
步骤1:获取待测时段内光伏电站中各光伏组串的电流值及待测时段前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据,并将所述电流值归一化;Step 1: Obtain the current value of each photovoltaic string in the photovoltaic power station during the period to be tested and the current value of all photovoltaic strings in the photovoltaic power station in the preset period before the period to be tested and the corresponding communication abnormal tag data, and store the current value Normalized;
在一个示例中,获取检测日期当天的光伏电站中各光伏组串的电流值及检测日期前30天内的光伏电站所有光伏组串的电流值及对应的通信异常标签数据,其中,所述30天内包含有电流值通信异常的光伏组串。In one example, the current value of each photovoltaic string in the photovoltaic power plant on the day of the detection date and the current value of all photovoltaic strings in the photovoltaic power plant within 30 days before the detection date and the corresponding abnormal communication tag data are obtained, wherein, within the 30 days Contains PV strings with abnormal current value communication.
在一个示例中,所述待测时段内光伏组串m的第j个时刻的电流值归一化的计算式如下所示:In an example, the calculation formula for normalizing the current value of the photovoltaic string m at the jth moment in the period to be tested is as follows:
Figure PCTCN2022078431-appb-000001
Figure PCTCN2022078431-appb-000001
式中,
Figure PCTCN2022078431-appb-000002
为待测时段内光伏组串m在第j个时刻归一化后的电流值,x m,j为待测时段内光伏组串m在第j个时刻的电流值,x m,k为待测时段内光伏组串m在第k个时刻的电流值,h为获取数据待测时段内所有时刻构成的集合,k为h内的任意时刻。
In the formula,
Figure PCTCN2022078431-appb-000002
is the normalized current value of the photovoltaic string m at the jth moment in the period to be tested, x m,j is the current value of the photovoltaic string m at the jth moment in the period to be tested, x m,k is the current value of the photovoltaic string m to be tested The current value of the photovoltaic string m at the kth moment in the measurement period, h is the set of all moments in the measurement period of the acquired data, and k is any moment in h.
步骤2:将归一化后的待测日期前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据记作样本集A,并将A中故障光伏组串的电流值及对应的通信异常标签构成的样本集记作B;Step 2: Record the normalized current values of all photovoltaic strings and the corresponding communication abnormal tag data in the preset period before the date to be tested as sample set A, and record the current value of the faulty photovoltaic strings in A And the sample set composed of corresponding abnormal communication labels is denoted as B;
步骤3:将所述样本集B中故障光伏组串的电流值输入预先训练好的VaDE模型中,得到VaDE模型生成的故障光伏组串电流值通信异常的样本集,并将所述样本集记作C,合并样本集A和样本集C得到合并后的样本集D;Step 3: Input the current value of the faulty photovoltaic string in the sample set B into the pre-trained VaDE model, obtain the sample set of abnormal communication of the current value of the faulty photovoltaic string generated by the VaDE model, and record the sample set For C, combine sample set A and sample set C to obtain the combined sample set D;
需要说明的是,所述预先训练好的VaDE模型包括:第一神经网络层、采样层和第二神经网络层;It should be noted that the pre-trained VaDE model includes: a first neural network layer, a sampling layer and a second neural network layer;
所述VaDE模型的训练过程包括:The training procedure of described VaDE model comprises:
获取归一化后的历史时段内光伏电站中故障光伏组串的电流值数据;Obtain the current value data of the faulty photovoltaic string in the photovoltaic power plant within the normalized historical period;
将获取的所述数据输入初始VaDE模型第一神经网络层、采样层和第二神经网络层中,将边际似然下界作为模型的损失函数,利用自适应矩阵估计Adam优化算法对所述模型进行训练,得到训练好的VaDE模型。The data input that obtains is in initial VaDE model first neural network layer, sampling layer and the second neural network layer, with marginal likelihood lower bound as the loss function of model, utilize adaptive matrix estimation Adam optimization algorithm to carry out described model Training to get the trained VaDE model.
步骤4:利用样本集D对XGBoost模型进行训练,得到训练好的XGBoost模型;Step 4: Use the sample set D to train the XGBoost model to obtain the trained XGBoost model;
步骤5:将待测时段内光伏电站中各光伏组串的电流归一化后的值输入训练好的XGBoost模型中,识别出待测时段内光伏电站中电流值通信异常的光伏组串。Step 5: Input the normalized value of the current of each photovoltaic string in the photovoltaic power station during the period to be tested into the trained XGBoost model, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested.
在本公开实施例中,利用样本集D对XGBoost模型进行训练,得到训练好的XGBoost模型,包括:In the disclosed embodiment, the XGBoost model is trained using the sample set D to obtain a trained XGBoost model, including:
获取样本集D,其中,D={(x i,y i)}(|D|=n,x i∈R m,y i∈R),x i为第i个样本一天各个时刻的指标值,y i为第i个样本一天中指标是否存在通信异常的标签值,y i∈{0,1},0表示通信正常,1表示通信异常,n为样本数,m为样本特征数; Obtain a sample set D, where D={(xi , y i )}(|D|=n, xi ∈ R m , y i ∈ R), and xi is the index value of the i-th sample at each time of the day , y i is the tag value of whether the indicator has abnormal communication in the i-th sample in one day, y i ∈ {0, 1}, 0 indicates normal communication, 1 indicates abnormal communication, n is the number of samples, and m is the number of sample features;
基于样本集D,构造梯度提升树,采用CART回归树作为模型的子树模型,通过迭代的方式增加CART回归树,将所有的CART回归树合并在一起得到训练好的XGBoost模型。Based on the sample set D, construct a gradient boosting tree, use the CART regression tree as the subtree model of the model, increase the CART regression tree iteratively, and merge all the CART regression trees together to obtain the trained XGBoost model.
需要说明的是,在增加CART回归树时,采用贪心算法首先找到基于每个样本特征找到收益最大的分割点,然后基于所有样本特征找到收益最大的特征,在特征和分割点的双层循环中挑出增益最大的分割点,基于所述分割点进行分裂增加CART回归树。It should be noted that when adding the CART regression tree, the greedy algorithm is used to first find the segmentation point with the largest profit based on each sample feature, and then find the feature with the largest profit based on all sample features. In the double-layer cycle of features and segmentation points Pick out the split point with the largest gain, and split and increase the CART regression tree based on the split point.
具体的,定义XGBoost模型来预测D中的y,首先,构造梯度提升树:基于样本集D进行树的累加,每次构建一颗树,采用CART回归树作为模型的子树模型,其次,构造第t次迭代正则项的损失函数:
Figure PCTCN2022078431-appb-000003
l为算法内部指定的真实值y i与预测值
Figure PCTCN2022078431-appb-000004
的损失函数,为可微凸函数,f t(x i)为第i个样本在第t次迭代中在叶子节点的权重,Ω(f t)为正则项,由第t次迭代树的叶节点个数和叶节点权重表示,随后,对L (t)
Figure PCTCN2022078431-appb-000005
处进行二阶泰勒展开,通过最小化损失函数确定权重,最后,将所有的CART回归树合并在一起得到训练好的XGBoost模型。
Specifically, define the XGBoost model to predict y in D. First, construct a gradient boosting tree: accumulate trees based on the sample set D, build a tree each time, and use the CART regression tree as the subtree model of the model. Secondly, construct The loss function of the regularization item at the tth iteration:
Figure PCTCN2022078431-appb-000003
l is the real value y i and predicted value specified inside the algorithm
Figure PCTCN2022078431-appb-000004
The loss function of is a differentiable convex function, f t ( xi ) is the weight of the i-th sample at the leaf node in the t-th iteration, Ω(f t ) is a regular term, and the leaf node of the t-th iteration tree The number of nodes and the weight of leaf nodes are represented, and then, for L (t) in
Figure PCTCN2022078431-appb-000005
The second-order Taylor expansion is performed at the place, and the weight is determined by minimizing the loss function. Finally, all the CART regression trees are merged together to obtain the trained XGBoost model.
需要说明的是,这里获取的光伏组串的指标值是电流值,还可以获取光伏组串的其他指标值,在这里不做限制。It should be noted that the index value of the photovoltaic string obtained here is the current value, and other index values of the photovoltaic string can also be obtained, which is not limited here.
综上所述,本公开实施例提供的一种基于XGBoost的光伏组串通信异常识别方法,基于VaDE模型的生成数据对XGBoost模型进行训练,得到训练好的XGBoost模型,利用所述训练好的模型对光伏电站中各组串的电流通信是否异常进行识别,可以提升XGBoost模型识别光伏电站中异常光伏组串的准确性。To sum up, the embodiment of the present disclosure provides an XGBoost-based photovoltaic string communication anomaly identification method, which trains the XGBoost model based on the generated data of the VaDE model to obtain a trained XGBoost model, and uses the trained model to Identifying whether the current communication of each string in the photovoltaic power station is abnormal can improve the accuracy of the XGBoost model in identifying abnormal photovoltaic strings in the photovoltaic power station.
实施例2Example 2
图2为本公开实施例提供的一种基于XGBoost的光伏组串通信异常识别系统的结构图,如图2所示,所述系统包括:Fig. 2 is a structural diagram of an XGBoost-based photovoltaic string communication abnormality identification system provided by an embodiment of the present disclosure. As shown in Fig. 2 , the system includes:
获取模块,用于获取待测时段内光伏电站中各光伏组串的电流值及待测时段前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据,并将所述电流值归一化;The obtaining module is used to obtain the current value of each photovoltaic string in the photovoltaic power station during the period to be tested and the current value of all photovoltaic strings in the photovoltaic power station during the preset period before the period to be tested and the corresponding abnormal communication tag data, and send the Current value normalization;
第一样本模块,用于将归一化后的待测日期前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据记作样本集A,并将A中故障光伏组串的电流值及对应的通信异常标签构成的样本集记作B;The first sample module is used to record the normalized current values of all photovoltaic strings of the photovoltaic power plant and the corresponding abnormal communication tag data within the preset period before the date to be tested as a sample set A, and record the faulty photovoltaics in A The sample set composed of the current value of the string and the corresponding abnormal communication label is denoted as B;
第二样本模块,用于将所述样本集B中故障光伏组串的电流值输入预先训练好的VaDE模型中,得到VaDE模型生成的故障光伏组串电流值通信异常的样本集,并将所述样本集记作C,合并样本集A和样本集C得到合并后的样本集D;The second sample module is used to input the current value of the faulty photovoltaic string in the sample set B into the pre-trained VaDE model, obtain the sample set of abnormal communication of the current value of the faulty photovoltaic string generated by the VaDE model, and send the The above sample set is denoted as C, and the combined sample set A and sample set C are combined to obtain the combined sample set D;
训练模块,用于利用样本集D对XGBoost模型进行训练,得到训练好的XGBoost模型;The training module is used to use the sample set D to train the XGBoost model to obtain the trained XGBoost model;
识别模块,用于将待测时段内光伏电站中各光伏组串的电流归一化后的值输入训练好的XGBoost模型中,识别出待测时段内光伏电站中电流值通信异常的光伏组串。The identification module is used to input the normalized value of the current of each photovoltaic string in the photovoltaic power station into the trained XGBoost model during the period to be tested, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested .
在本公开实施例中,所述待测时段内光伏组串m的第j个时刻的电流值归一化的计算式如下所示:In the embodiment of the present disclosure, the normalized calculation formula of the current value of the photovoltaic string m at the jth moment in the period to be tested is as follows:
Figure PCTCN2022078431-appb-000006
Figure PCTCN2022078431-appb-000006
式中,
Figure PCTCN2022078431-appb-000007
为待测时段内光伏组串m在第j个时刻归一化后的电流值,x m,j为待测时段内光伏组串m在第j个时刻的电流值,x m,k为待测时段内光伏组串m在第k个时刻的电流值,h为获取数据待测时段内所有时刻构成的集合,k为h内的任意时刻。
In the formula,
Figure PCTCN2022078431-appb-000007
is the normalized current value of the photovoltaic string m at the jth moment in the period to be tested, x m,j is the current value of the photovoltaic string m at the jth moment in the period to be tested, x m,k is the current value of the photovoltaic string m to be tested The current value of the photovoltaic string m at the kth moment in the measurement period, h is the set of all moments in the measurement period of the acquired data, and k is any moment in h.
在本公开实施例中,所述预先训练好的VaDE模型包括:第一神经网络层、采样层和第二神经网络层;In an embodiment of the present disclosure, the pre-trained VaDE model includes: a first neural network layer, a sampling layer and a second neural network layer;
所述VaDE模型的训练过程包括:The training procedure of described VaDE model comprises:
获取归一化后的历史时段内光伏电站中故障光伏组串的电流值数据;Obtain the current value data of the faulty photovoltaic string in the photovoltaic power plant within the normalized historical period;
将获取的所述数据输入初始VaDE模型第一神经网络层、采样层和第二神经网络层中,将边际似然下界作为模型的损失函数,利用自适应矩阵估计Adam优化算法对所述模型进行训练,得到训练好的VaDE模型。The data input that obtains is in initial VaDE model first neural network layer, sampling layer and the second neural network layer, with marginal likelihood lower bound as the loss function of model, utilize adaptive matrix estimation Adam optimization algorithm to carry out described model Training to get the trained VaDE model.
在本公开实施例中,所述训练模块,包括:In an embodiment of the present disclosure, the training module includes:
获取单元,用于获取样本集D,其中,D={(x i,y i)}(|D|=n,x i∈R m,y i∈R),x i为第i个样本一天各个时刻的指标值,y i为第i个样本一天中指标是否存在通信异常的标签值,y i∈{0,1},0表示通信正常,1表示通信异常,n为样本数,m为样本特征数; The acquisition unit is used to acquire the sample set D, where D={( xi , y i )}(|D|=n, xi ∈ R m , y i ∈ R), and xi is the i-th sample day The index value at each moment, y i is the label value of whether the index has abnormal communication in the i-th sample in one day, y i ∈ {0, 1}, 0 means normal communication, 1 means abnormal communication, n is the number of samples, m is number of sample features;
训练单元,用于基于样本集D,构造梯度提升树,采用CART回归树作为模型的子树模型,通过迭代的方式增加CART回归树,将所有的CART回归树合并在一起得到训练好的XGBoost模型。The training unit is used to construct a gradient boosting tree based on the sample set D, using the CART regression tree as the subtree model of the model, increasing the CART regression tree through iteration, and merging all the CART regression trees together to obtain the trained XGBoost model .
需要说明的是,在增加CART回归树时,采用贪心算法首先找到基于每个样本特征找到收益最大的分割点,然后基于所有样本特征找到收益最大的特征,在特征和分割点的双层循环中挑出增益最大的分割点,基于所述分割点进行分裂增加CART回归树。It should be noted that when adding the CART regression tree, the greedy algorithm is used to first find the segmentation point with the largest profit based on each sample feature, and then find the feature with the largest profit based on all sample features. In the double-layer cycle of features and segmentation points Pick out the split point with the largest gain, and split and increase the CART regression tree based on the split point.
综上所述,本公开提供的一种基于XGBoost的光伏组串通信异常识别系统,基于VaDE模型的生成数据对XGBoost模型进行训练,得到训练好的XGBoost模型,利用所述训练好的模型对光伏电站中各组串的电流通信是否异常进行识别,可以提升XGBoost模型识别光伏电站中异常光伏组串的准确性。To sum up, the XGBoost-based photovoltaic string communication anomaly recognition system provided by the present disclosure trains the XGBoost model based on the generated data of the VaDE model to obtain a trained XGBoost model, and uses the trained model to identify photovoltaic strings. Identifying whether the current communication of each string in the power station is abnormal can improve the accuracy of the XGBoost model in identifying abnormal photovoltaic strings in the photovoltaic power station.
本公开实施例还提出一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现实施例1所述的基于XGBoost的光伏组串通信异常识别方法。An embodiment of the present disclosure also proposes an electronic device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the program, the method based on XGBoost's photovoltaic string communication anomaly identification method.
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现实施例1所述的基于XGBoost的光伏组串通信异常识别方法。The embodiment of the present disclosure also proposes a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the XGBoost-based photovoltaic string communication abnormality identification method described in Embodiment 1 is implemented.
本公开实施例还提出一种计算机程序产品,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以执行实施例1所述的基于XGBoost的光伏组串通信异常识别方法。The embodiment of the present disclosure also proposes a computer program product, the computer program product includes computer program code, when the computer program code is run on the computer, to execute the XGBoost-based photovoltaic string communication described in Embodiment 1 Exception identification method.
本公开实施例还提出一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行实施例1所述的基于XGBoost的光伏组串通信异常识别方法。The embodiment of the present disclosure also proposes a computer program, the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes the XGBoost-based photovoltaic string communication exception described in Embodiment 1 recognition methods.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技 术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本公开的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本公开的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of a process , and the scope of preferred embodiments of the present disclosure includes additional implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present disclosure pertain.
尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present disclosure have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limitations on the present disclosure, and those skilled in the art can understand the above-mentioned embodiments within the scope of the present disclosure. The embodiments are subject to changes, modifications, substitutions and variations.

Claims (14)

  1. 一种基于XGBoost的光伏组串通信异常识别方法,其特征在于,所述方法包括:An XGBoost-based photovoltaic string communication abnormal identification method, characterized in that the method comprises:
    获取待测时段内光伏电站中各光伏组串的电流值及待测时段前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据,并将所述电流值归一化;Obtain the current value of each photovoltaic string in the photovoltaic power station during the period to be tested and the current value of all photovoltaic strings in the photovoltaic power station in the preset period before the period to be tested and the corresponding communication abnormal tag data, and normalize the current value ;
    将归一化后的待测日期前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据记作样本集A,并将A中故障光伏组串的电流值及对应的通信异常标签构成的样本集记作B;Record the normalized current values of all photovoltaic strings and corresponding abnormal communication label data in the photovoltaic power station within the preset period before the date to be tested as sample set A, and record the current values of the faulty photovoltaic strings in A and the corresponding The sample set composed of abnormal communication labels is denoted as B;
    将所述样本集B中故障光伏组串的电流值输入预先训练好的VaDE模型中,得到VaDE模型生成的故障光伏组串电流值通信异常的样本集,并将所述样本集记作C,合并样本集A和样本集C得到合并后的样本集D;Input the current value of the faulty photovoltaic string in the sample set B into the pre-trained VaDE model to obtain a sample set of abnormal communication of the current value of the faulty photovoltaic string generated by the VaDE model, and denote the sample set as C, Merge sample set A and sample set C to obtain a combined sample set D;
    利用样本集D对XGBoost模型进行训练,得到训练好的XGBoost模型;Use the sample set D to train the XGBoost model to obtain the trained XGBoost model;
    将待测时段内光伏电站中各光伏组串的电流归一化后的值输入训练好的XGBoost模型中,识别出待测时段内光伏电站中电流值通信异常的光伏组串。Input the normalized value of the current of each photovoltaic string in the photovoltaic power station during the period to be tested into the trained XGBoost model, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested.
  2. 如权利要求1所述的方法,其特征在于,所述待测时段内光伏组串m的第j个时刻的电流值归一化的计算式如下所示:The method according to claim 1, characterized in that, the calculation formula for normalizing the current value of the photovoltaic string m at the jth moment in the period to be measured is as follows:
    Figure PCTCN2022078431-appb-100001
    Figure PCTCN2022078431-appb-100001
    式中,
    Figure PCTCN2022078431-appb-100002
    为待测时段内光伏组串m在第j个时刻归一化后的电流值,x m,j为待测时段内光伏组串m在第j个时刻的电流值,x m,k为待测时段内光伏组串m在第k个时刻的电流值,h为获取数据待测时段内所有时刻构成的集合,k为h内的任意时刻。
    In the formula,
    Figure PCTCN2022078431-appb-100002
    is the normalized current value of the photovoltaic string m at the jth moment in the period to be tested, x m,j is the current value of the photovoltaic string m at the jth moment in the period to be tested, x m,k is the current value of the photovoltaic string m to be tested The current value of the photovoltaic string m at the kth moment in the measurement period, h is the set of all moments in the measurement period of the acquired data, and k is any moment in h.
  3. 如权利要求1或2所述方法,其特征在于,所述预先训练好的VaDE模型包括:第一神经网络层、采样层和第二神经网络层;The method according to claim 1 or 2, wherein said pre-trained VaDE model comprises: a first neural network layer, a sampling layer and a second neural network layer;
    所述VaDE模型的训练过程包括:The training procedure of described VaDE model comprises:
    获取归一化后的历史时段内光伏电站中故障光伏组串的电流值数据;Obtain the current value data of the faulty photovoltaic string in the photovoltaic power plant within the normalized historical period;
    将获取的所述数据输入初始VaDE模型第一神经网络层、采样层和第二神经网络层中,将边际似然下界作为模型的损失函数,利用自适应矩阵估计Adam优化算法对所述模型进行训练,得到训练好的VaDE模型。The data input that obtains is in initial VaDE model first neural network layer, sampling layer and the second neural network layer, with marginal likelihood lower bound as the loss function of model, utilize adaptive matrix estimation Adam optimization algorithm to carry out described model Training to get the trained VaDE model.
  4. 如权利要求1至3中任一项所述的方法,其特征在于,所述利用样本集D对XGBoost模型进行训练,得到训练好的XGBoost模型,包括:The method according to any one of claims 1 to 3, wherein the training of the XGBoost model by using the sample set D to obtain the trained XGBoost model includes:
    获取样本集D,其中,D={(x i,y i)}(|D|=n,x i∈R m,y i∈R),x i为第i个样本一天各个 时刻的指标值,y i为第i个样本一天中指标是否存在通信异常的标签值,y i∈{0,1},0表示通信正常,1表示通信异常,n为样本数,m为样本特征数; Obtain a sample set D, where D={(xi , y i )}(|D|=n, xi ∈ R m , y i ∈ R), and xi is the index value of the i-th sample at each time of the day , y i is the tag value of whether the indicator has abnormal communication in the i-th sample in one day, y i ∈ {0, 1}, 0 indicates normal communication, 1 indicates abnormal communication, n is the number of samples, and m is the number of sample features;
    基于样本集D,构造梯度提升树,采用CART回归树作为模型的子树模型,通过迭代的方式增加CART回归树,将所有的CART回归树合并在一起得到训练好的XGBoost模型。Based on the sample set D, construct a gradient boosting tree, use the CART regression tree as the subtree model of the model, increase the CART regression tree iteratively, and merge all the CART regression trees together to obtain the trained XGBoost model.
  5. 如权利要求4所述的方法,其特征在于,在增加CART回归树时,采用贪心算法首先找到基于每个样本特征找到收益最大的分割点,然后基于所有样本特征找到收益最大的特征,在特征和分割点的双层循环中挑出增益最大的分割点,基于所述分割点进行分裂增加CART回归树。The method according to claim 4, wherein when increasing the CART regression tree, a greedy algorithm is first used to find the segmentation point with the largest profit based on each sample feature, and then to find the feature with the largest profit based on all sample features. In the double-layer cycle of the split point and the split point, the split point with the largest gain is selected, and the CART regression tree is increased by splitting based on the split point.
  6. 一种基于XGBoost的光伏组串通信异常识别系统,其特征在于,所述系统包括:An XGBoost-based photovoltaic string communication abnormality identification system, characterized in that the system includes:
    获取模块,用于获取待测时段内光伏电站中各光伏组串的电流值及待测时段前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据,并将所述电流值归一化;The obtaining module is used to obtain the current value of each photovoltaic string in the photovoltaic power station during the period to be tested and the current value of all photovoltaic strings in the photovoltaic power station during the preset period before the period to be tested and the corresponding abnormal communication tag data, and send the Current value normalization;
    第一样本模块,用于将归一化后的待测日期前预设时段内光伏电站所有光伏组串的电流值及对应的通信异常标签数据记作样本集A,并将A中故障光伏组串的电流值及对应的通信异常标签构成的样本集记作B;The first sample module is used to record the normalized current values of all photovoltaic strings of the photovoltaic power plant and the corresponding abnormal communication tag data within the preset period before the date to be tested as a sample set A, and record the faulty photovoltaics in A The sample set composed of the current value of the string and the corresponding abnormal communication label is denoted as B;
    第二样本模块,用于将所述样本集B中故障光伏组串的电流值输入预先训练好的VaDE模型中,得到VaDE模型生成的故障光伏组串电流值通信异常的样本集,并将所述样本集记作C,合并样本集A和样本集C得到合并后的样本集D;The second sample module is used to input the current value of the faulty photovoltaic string in the sample set B into the pre-trained VaDE model, obtain the sample set of abnormal communication of the current value of the faulty photovoltaic string generated by the VaDE model, and send the The above sample set is denoted as C, and the combined sample set A and sample set C are combined to obtain the combined sample set D;
    训练模块,用于利用样本集D对XGBoost模型进行训练,得到训练好的XGBoost模型;The training module is used to use the sample set D to train the XGBoost model to obtain the trained XGBoost model;
    识别模块,用于将待测时段内光伏电站中各光伏组串的电流归一化后的值输入训练好的XGBoost模型中,识别出待测时段内光伏电站中电流值通信异常的光伏组串。The identification module is used to input the normalized value of the current of each photovoltaic string in the photovoltaic power station into the trained XGBoost model during the period to be tested, and identify the photovoltaic strings with abnormal current value communication in the photovoltaic power station during the period to be tested .
  7. 如权利要求6所述的系统,其特征在于,所述待测时段内光伏组串m的第j个时刻的电流值归一化的计算式如下所示:The system according to claim 6, wherein the calculation formula for normalizing the current value of the photovoltaic string m at the jth moment in the period to be measured is as follows:
    Figure PCTCN2022078431-appb-100003
    Figure PCTCN2022078431-appb-100003
    式中,
    Figure PCTCN2022078431-appb-100004
    为待测时段内光伏组串m在第j个时刻归一化后的电流值,x m,j为待测时段内光伏组串m在第j个时刻的电流值,x m,k为待测时段内光伏组串m在第k个时刻的电流值,h为获取数据待测时段内所有时刻构成的集合,k为h内的任意时刻。
    In the formula,
    Figure PCTCN2022078431-appb-100004
    is the normalized current value of the photovoltaic string m at the jth moment in the period to be tested, x m,j is the current value of the photovoltaic string m at the jth moment in the period to be tested, x m,k is the current value of the photovoltaic string m to be tested The current value of the photovoltaic string m at the kth moment in the measurement period, h is the set of all moments in the measurement period of the acquired data, and k is any moment in h.
  8. 如权利要求6或7所述系统,其特征在于,所述预先训练好的VaDE模型包括:第一神经网络层、采样层和第二神经网络层;system as claimed in claim 6 or 7, is characterized in that, described pre-trained VaDE model comprises: the first neural network layer, sampling layer and the second neural network layer;
    所述VaDE模型的训练过程包括:The training procedure of described VaDE model comprises:
    获取归一化后的历史时段内光伏电站中故障光伏组串的电流值数据;Obtain the current value data of the faulty photovoltaic string in the photovoltaic power plant within the normalized historical period;
    将获取的所述数据输入初始VaDE模型第一神经网络层、采样层和第二神经网络层中,将边际似然下界作为模型的损失函数,利用自适应矩阵估计Adam优化算法对所述模型进行训练,得到训练好的VaDE模型。The data input that obtains is in initial VaDE model first neural network layer, sampling layer and the second neural network layer, with marginal likelihood lower bound as the loss function of model, utilize adaptive matrix estimation Adam optimization algorithm to carry out described model Training to get the trained VaDE model.
  9. 如权利要求6至8中任一项所述的系统,其特征在于,所述训练模块,包括:The system according to any one of claims 6 to 8, wherein the training module includes:
    获取单元,用于获取样本集D,其中,D={(x i,y i)}(|D|=n,x i∈R m,y i∈R),x i为第i个样本一天各个时刻的指标值,y i为第i个样本一天中指标是否存在通信异常的标签值,y i∈{0,1},0表示通信正常,1表示通信异常,n为样本数,m为样本特征数; The acquisition unit is used to acquire the sample set D, where D={( xi , y i )}(|D|=n, xi ∈ R m , y i ∈ R), and xi is the i-th sample day The index value at each moment, y i is the label value of whether the index has abnormal communication in the i-th sample in one day, y i ∈ {0, 1}, 0 means normal communication, 1 means abnormal communication, n is the number of samples, m is number of sample features;
    训练单元,用于基于样本集D,构造梯度提升树,采用CART回归树作为模型的子树模型,通过迭代的方式增加CART回归树,将所有的CART回归树合并在一起得到训练好的XGBoost模型。The training unit is used to construct a gradient boosting tree based on the sample set D, using the CART regression tree as the subtree model of the model, increasing the CART regression tree through iteration, and merging all the CART regression trees together to obtain the trained XGBoost model .
  10. 如权利要求9所述的系统,其特征在于,在增加CART回归树时,采用贪心算法首先找到基于每个样本特征找到收益最大的分割点,然后基于所有样本特征找到收益最大的特征,在特征和分割点的双层循环中挑出增益最大的分割点,基于所述分割点进行分裂增加CART回归树。The system according to claim 9, wherein when increasing the CART regression tree, a greedy algorithm is used to first find the segmentation point with the largest profit based on each sample feature, and then find the feature with the largest profit based on all sample features. In the double-layer cycle of the split point and the split point, the split point with the largest gain is selected, and the CART regression tree is increased by splitting based on the split point.
  11. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-5中任一项所述的方法。A memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor implements the method according to any one of claims 1-5 when executing the program.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-5中任一项所述的方法。A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-5 is realized.
  13. 一种计算机程序产品,其特征在于,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以执行如权利要求1-5中任一项所述的方法。A computer program product, characterized in that the computer program product includes computer program code, and when the computer program code is run on a computer, the method according to any one of claims 1-5 is executed.
  14. 一种计算机程序,其特征在于,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如权利要求1-5中任一项所述的方法。A computer program, characterized in that the computer program includes computer program code, and when the computer program code is run on a computer, the computer executes the method according to any one of claims 1-5.
PCT/CN2022/078431 2021-11-17 2022-02-28 Photovoltaic string communication abnormality identification method and system based on xgboost WO2023087569A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111362748.2A CN114298084A (en) 2021-11-17 2021-11-17 XGboost-based photovoltaic group string communication abnormity identification method and system
CN202111362748.2 2021-11-17

Publications (1)

Publication Number Publication Date
WO2023087569A1 true WO2023087569A1 (en) 2023-05-25

Family

ID=80965317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078431 WO2023087569A1 (en) 2021-11-17 2022-02-28 Photovoltaic string communication abnormality identification method and system based on xgboost

Country Status (2)

Country Link
CN (1) CN114298084A (en)
WO (1) WO2023087569A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117578405A (en) * 2023-11-03 2024-02-20 华能国际电力股份有限公司河北清洁能源分公司 Photovoltaic power plant inverter string abnormality analysis method and device
CN118194243A (en) * 2024-03-20 2024-06-14 北京智盟信通科技有限公司 Photovoltaic combiner box online early warning method based on AI algorithm and edge calculation

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758002B (en) * 2022-06-15 2022-09-02 南开大学 Photovoltaic string position determining method and system based on aerial image
CN117114254B (en) * 2023-10-25 2024-03-19 山东电力工程咨询院有限公司 Power grid new energy abnormal data monitoring method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426956A (en) * 2015-11-06 2016-03-23 国家电网公司 Ultra-short-period photovoltaic prediction method
CN105827200A (en) * 2016-03-01 2016-08-03 华为技术有限公司 Photoelectric system battery pack string fault identification method, device and equipment
CN109583515A (en) * 2018-12-20 2019-04-05 福州大学 A kind of photovoltaic power generation fault detection and classification method based on BP_Adaboost
US10311584B1 (en) * 2017-11-09 2019-06-04 Facebook Technologies, Llc Estimation of absolute depth from polarization measurements
CN112364477A (en) * 2020-09-29 2021-02-12 中国电器科学研究院股份有限公司 Outdoor empirical prediction model library generation method and system
US20210165130A1 (en) * 2018-06-14 2021-06-03 Siemens Aktiengesellschaft Predicting sun light irradiation intensity with neural network operations

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021806B (en) * 2016-06-06 2018-10-30 福州大学 A kind of photovoltaic string formation method for diagnosing faults based on kernel function extreme learning machine
CN109409420B (en) * 2018-10-08 2022-05-03 西安热工研究院有限公司 Photovoltaic string fault diagnosis method under non-uniform irradiance
CN110995459B (en) * 2019-10-12 2021-12-14 平安科技(深圳)有限公司 Abnormal object identification method, device, medium and electronic equipment
CN112782495A (en) * 2019-11-06 2021-05-11 成都鼎桥通信技术有限公司 String abnormity identification method for photovoltaic power station

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426956A (en) * 2015-11-06 2016-03-23 国家电网公司 Ultra-short-period photovoltaic prediction method
CN105827200A (en) * 2016-03-01 2016-08-03 华为技术有限公司 Photoelectric system battery pack string fault identification method, device and equipment
US10311584B1 (en) * 2017-11-09 2019-06-04 Facebook Technologies, Llc Estimation of absolute depth from polarization measurements
US20210165130A1 (en) * 2018-06-14 2021-06-03 Siemens Aktiengesellschaft Predicting sun light irradiation intensity with neural network operations
CN109583515A (en) * 2018-12-20 2019-04-05 福州大学 A kind of photovoltaic power generation fault detection and classification method based on BP_Adaboost
CN112364477A (en) * 2020-09-29 2021-02-12 中国电器科学研究院股份有限公司 Outdoor empirical prediction model library generation method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117578405A (en) * 2023-11-03 2024-02-20 华能国际电力股份有限公司河北清洁能源分公司 Photovoltaic power plant inverter string abnormality analysis method and device
CN118194243A (en) * 2024-03-20 2024-06-14 北京智盟信通科技有限公司 Photovoltaic combiner box online early warning method based on AI algorithm and edge calculation

Also Published As

Publication number Publication date
CN114298084A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
WO2023087569A1 (en) Photovoltaic string communication abnormality identification method and system based on xgboost
CN111914644B (en) Dual-mode cooperation based weak supervision time sequence action positioning method and system
Sridharan et al. Visual fault detection in photovoltaic modules using decision tree algorithms with deep learning features
CN112101085B (en) Intelligent fault diagnosis method based on importance weighted domain antagonism self-adaptation
CN110516095B (en) Semantic migration-based weak supervision deep hash social image retrieval method and system
CN113869418B (en) Small sample ship target identification method based on global attention relation network
CN103020485B (en) Based on the short-term wind speed forecasting method of beta noise core ridge regression technology
CN106156805A (en) A kind of classifier training method of sample label missing data
CN105787521A (en) Semi-monitoring crowdsourcing marking data integration method facing imbalance of labels
CN114842371B (en) Unsupervised video anomaly detection method
He et al. A unified label noise-tolerant framework of deep learning-based fault diagnosis via a bounded neural network
CN114048546B (en) Method for predicting residual service life of aeroengine based on graph convolution network and unsupervised domain self-adaption
CN117829822B (en) Power transformer fault early warning method and system
CN117891960B (en) Multi-mode hash retrieval method and system based on adaptive gradient modulation
TW202009803A (en) Prediction system and method for solar photovoltaic power generation
Yang Realization of vehicle classification system based on deep learning
CN116680624B (en) Classification method, system and storage medium for metadata of power system
CN116451118B (en) Deep learning-based radar photoelectric outlier detection method
CN118132934A (en) Real-time state analysis method and system for machine tool spindle
CN116932763A (en) Hierarchical multi-label professional technical document classification method and system using label information
CN117390413A (en) Recognition method for distributed power optical fiber vibration signal noise reduction and time sequence feature extraction
CN115114990A (en) Power distribution network state online detection method based on graph neural network
CN116304789A (en) VP inclinometer fault diagnosis method and device
CN109800923B (en) Short-term power combination prediction method for distributed wind power generation
CN113987697A (en) Mechanical equipment fault diagnosis method based on vibration data