CN109840541A

CN109840541A - A kind of network transformer Fault Classification based on XGBoost

Info

Publication number: CN109840541A
Application number: CN201811482922.5A
Authority: CN
Inventors: 沈力; 杜红军; 郭昆亚; 陈硕; 乔林; 冉冉; 周巧妮; 郭哲强; 吕旭明; 卢彬; 李静; 刘云飞
Original assignee: Nanjing University of Aeronautics and Astronautics; Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
Current assignee: Nanjing University of Aeronautics and Astronautics; State Grid Corp of China SGCC; Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
Priority date: 2018-12-05
Filing date: 2018-12-05
Publication date: 2019-06-04

Abstract

The present invention provides a kind of network transformer Fault Classification based on XGBoost.The network transformer Fault Classification based on XGBoost includes the following steps: step 1: obtaining and integrate the DGA data set of multiple transformers；Step 2: the data after being normalized by the data in DGA data set that step 1 obtains and will be pretreated are given XGBoost and are trained, a certain number of post-class processings are constructed to be fitted the preceding residual error once learnt, and optimal XGBoost parameter combination is found by grid search, to improve the diagnosis accuracy to transformer fault.

Description

A kind of network transformer Fault Classification based on XGBoost

Technical field

The invention belongs to transformer fault diagnosis technical fields, more particularly to a kind of power grid transformation based on XGBoost Device Fault Classification.

Background technique

Power transformers (power transformer) be used to by electrical power conversion that power plant generates be transferred to the whole world User (customer) in factory/company.Power transformer is one of key equipment of electric system, its operating status is very Determine that can power grid good work in big degree.Thereby it is ensured that the good operating status for becoming power transformer is very heavy It wants, such transformer could provide reliable and lasting electric power, this is necessary in real world.Currently, being permitted More Utilities Electric Co.s implement various status assessments and maintenance measure, dissolved gas analysis to the state of transformer (DGA) be exactly it is one such, DGA is the key that a kind of concentration based on dissolved gas in transformer insulation oil and gas generate The method that rate is detected and predicted to the failure of transformer, such as key gas, IEC ratio, Rogers ratio and Dornenburg ratio method.

DGA method can make some estimations to the operating status of transformer, unfortunately, although they operate letter It is single, but exist and encode the problems such as incomplete, boundary is excessively absolute, these methods often provide different prediction results, say These bright methods be it is highly inaccurate, will lead to many failures correctly can not timely be found, also can cause to be stranded to tester It is difficult.These problems have driven many researchers to go to study the method based on machine learning to diagnose to transformer fault. With the development of intelligent algorithm and machine learning algorithm, support vector machines, neural network, post-class processing, principal component point The technologies such as analysis are all gradually applied to the fault diagnosis of transformer and achieve certain achievement.But since transformer data are past Toward be very difficult to obtain and it is very rare, although these intelligent algorithms can obtain fairly good classification results, often Over-fitting can be fallen into.For example, although support vector machines is good at handling Small Sample Database, but the essence of two classifiers due to it, Keep its efficiency in the problem of processing more classification (classification of transformer is more classification problems, includes various faults type) lower. And neural network is when handling Small Sample Database, although thering is very strong learning ability to be easily trapped into local optimum and leading Cause over-fitting.Post-class processing is although high-efficient, but the learning ability of single tree is too weak and is also easily trapped into over-fitting and asks Topic.

Summary of the invention

It is an object of the invention in view of the drawbacks of the prior art or problem, provide a kind of power grid based on XGBoost to become Depressor Fault Classification.

Technical scheme is as follows: a kind of network transformer Fault Classification based on XGBoost includes as follows Step:

Step 1: obtaining and integrate the DGA data set of multiple transformers；

Step 2: the number after being normalized by the data in DGA data set that step 1 obtains and will be pretreated It is trained according to XGBoost is given, constructs a certain number of post-class processings and the preceding residual error once learnt is fitted, and lead to It crosses grid search and finds optimal XGBoost parameter combination, to improve the diagnosis accuracy to transformer fault.

Preferably, in step 2, XGBoost model is specifically included:

Input: input can be expressed as D by the transformer data acquisition system after preliminary screening and normalization, data set ={ X₁,X₂,X₃,…,X_d, the number of sample, X in d data set_i={ x₁,x₂,…,x_n, y } and indicate each sample data, x_i Indicate the feature of each dimension, 8 kinds of fault types of y ∈ { 0,1,2,3,4,5,6,7 } indication transformer；

Output: setting XGBoost selects softmax as target, returns to the classification of prediction；

Loss function: XGBoost requires loss function to be can dimpling loss function.

Technical solution provided by the invention has the following beneficial effects:

The network transformer Fault Classification based on XGBoost incorporates DGA gas data from different sources, And the ratio for combining gas with various forms the relatively large DGA data set of quantity；

Moreover, the fault diagnosis and classification of integrated study (XGBoost) for transformer can permit transformer There are a small amount of missing values (at this moment very common, due to the severe running environment of transformer) in DGA data, pass through The integrated study ability building of XGBoost largely can handle high inclination and the classification of polymorphic continuous data type data returns Tree, and optimal model is obtained by grid search；

It is trained in addition, giving the DGA data set obtained after pretreatment to XGBoost, constructs a large amount of classification and return Tree, is constantly fitted previous prediction result to improve the diagnosis accuracy to transformer fault.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.

The description of specific distinct unless the context otherwise, the present invention in element and component, the shape that quantity both can be single Formula exists, and form that can also be multiple exists, and the present invention is defined not to this.Although step in the present invention with label into It has gone arrangement, but is not used to limit the precedence of step, unless expressly stated the order of step or holding for certain step Based on row needs other steps, otherwise the relative rank of step is adjustable.It is appreciated that used herein Term "and/or" one of is related to and covers associated listed item or one or more of any and all possible groups It closes.

A kind of network transformer Fault Classification based on XGBoost provided by the invention, includes the following steps:

Step 1: obtaining and integrate the DGA data set of multiple transformers；

Specifically, extreme gradient promotes (Extreme Gradient Boosting, XGBoost), is expanding for tree promotion Machine learning system is opened up, is a kind of novel classification device based on post-class processing (CART) set.

It is parallel in XGBoost supported feature granularity, XGBoost before training, can to being ranked up in advance to data, Then block structure is saved as, iteration later is recursive to use this structure, and greatly reduces calculation amount.

There is the data set of n sample m dimensional feature for one XGBoost is a kind of tree aggregation model, and the output of target is predicted using K function superposition.

It is the set of post-class processing (CART) composition, Q is sample data x_iTo the mapping of the leaf node of CART tree, for indicating the structure of one tree, T indicates the leaf in one tree The number of child node.Each f_kIt is equivalent to the weight (score) of q a mapping and its leaf node, this weight is a company Continuous value and help to realize efficient optimization algorithm.w_iIndicate the weight of i-th of leaf node.

Final XGBoost model in order to obtain needs to train a series of post-class processing, this integrated study model Objective function definition as shown in (2).L indicate can dimpling loss function, this loss function is for measuring true tag y and pre- Survey label valueBetween gap.Ω is a regular terms, and the regular terms set by K is superimposed to obtain, and is arrived for smooth last study Weight, punish the complexity of model, prevent over-fitting.Such objective function meeting final choice goes out one by a series of The model of relatively simple anticipation function composition, has stronger generalization ability.If regular terms is set as 0, (2) in fact It is traditional gradient boosted tree.

Integrated model in formula (2) includes function as parameter, therefore it is excellent not to be available traditional optimization method progress Change, this model is trained in an iterative manner.IfIt is the prediction result of i-th of sample of t iteration, objective function As shown in (3), f_tRepresent the t times iteration creation new tree, selected by formula (3) most can lift scheme f_t, pass through f_t To be fitted the prediction result of last iteration and the residual error of true value.

During grad enhancement, XGBoost carrys out optimization object function using the second Taylor series, reaches simplest shape Shown in formula such as formula (4), whereinandThey are loss respectively The single order and second dervative of function.I_j=i | q (x_i)=j } indicate leaf node j sample number.

When the structure q (x) of one tree is given, the optimal weights of leaf node calculate by formula (5) It arrives, the quality of tree construction q (x) can be calculated by formula (6).

It can not usually go to enumerate all possible tree construction, the method for use is to pass through iteration since root node Mode branch is added to tree, it is assumed that I_LAnd I_RIt is the sample set of left subtree and right subtree after dividing, I=I respectively_L∪I_R, Formula (7) be used to assess candidate split vertexes.

In embodiments of the present invention, the transformer fault prediction model based on XGBoost is by the change after feature selecting Depressor data are input into XGBoost, and XGBoost passes through the approximation in split point finding algorithm according to the data after pretreated Algorithm establishes first CART tree, is predicted according to this tree sample, predicted value and true value are compared, obtained residual Residual error is simultaneously constructed next CART regression tree as new label information and sample data together and carrys out regression criterion by difference.Therefore, One tree is added every time just can be such that the value of loss function constantly reduces.

Specifically, XGBoost model specifically includes:

Loss function: XGBoost requires loss function to be can dimpling loss function；Using merror, (more classification are wrong Accidentally rate) and mlogloss (the negative log-likelihood for being defined as the true tag of given probability classification prediction).

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.

In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims

1. a kind of network transformer Fault Classification based on XGBoost, characterized by the following steps:

Step 1: obtaining and integrate the DGA data set of multiple transformers；

Step 2: the data after being normalized by the data in DGA data set that step 1 obtains and will be pretreated are handed over It is trained to XGBoost, constructs a certain number of post-class processings and the preceding residual error once learnt is fitted, and pass through net Optimal XGBoost parameter combination is found in lattice search, to improve the diagnosis accuracy to transformer fault.

2. a kind of network transformer Fault Classification based on XGBoost according to claim 1, which is characterized in that XGBoost model specifically includes:

Input: input can be expressed as D=by the transformer data acquisition system after preliminary screening and normalization, data set {X₁,X₂,X₃,…,X_d, the number of sample, X in d data set_i={ x₁,x₂,…,x_n, y } and indicate each sample data, x_iTable Show the feature of each dimension, 8 kinds of fault types of y ∈ { 0,1,2,3,4,5,6,7 } indication transformer；

Loss function: XGBoost requires loss function to be can dimpling loss function.

3. a kind of network transformer Fault Classification based on XGBoost according to claim 1, which is characterized in that In step 1, the ratio that joined gas with various in DGA data set enriches the feature quantity in DGA data set.