CN116401545A

CN116401545A - Multimode fusion type turbine runout analysis method

Info

Publication number: CN116401545A
Application number: CN202310322168.3A
Authority: CN
Inventors: 王晓兰; 盛明珺; 刘守豹; 管毓瑶; 胡思宇; 刘洋成; 魏棕凯
Original assignee: China Datang Corp Science and Technology Research Institute Co Ltd; Datang Hydropower Science and Technology Research Institute Co Ltd
Current assignee: China Datang Corp Science and Technology Research Institute Co Ltd; Datang Hydropower Science and Technology Research Institute Co Ltd
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2023-07-07

Abstract

The invention discloses a multimode fusion type turbine runout analysis method, which comprises the following steps of: s1, acquiring relevant data of the vibration of a historical unit, and preprocessing; s2, performing characteristic engineering on the relevant data of the unit runout to obtain a training set; s3, extracting data associated with runout in the training set, and respectively inputting an SVR model, a LightGBM model and an XGBoost model for training; s4, performing least square fitting on the results of the three models to obtain weight distribution of the three models, and forming a fusion model; s5, inputting the on-line monitoring runout related data into a fusion model to obtain a predicted runout value and taking the predicted runout value as a standard value of runout under the working condition; s6, comparing the collected runout data with a standard value, and marking abnormal data and abnormal grades. According to the method, multidimensional influence factors are considered, the operation condition of the water turbine is considered, three models are used for training respectively, the accuracy of a prediction model is ensured by an automatic weighting mode, and the accuracy and the scientificity of prediction are improved.

Description

Multimode fusion type turbine runout analysis method

Technical Field

The invention belongs to the technical field of hydroelectric generator operation analysis, and particularly relates to a multimode fusion type hydraulic turbine runout analysis method.

Background

The hydroelectric generating set is used as a large-scale rotary machine, and the vibration of the set in operation is ubiquitous, can not be completely avoided and eliminated, and serious vibration of the set affects the power supply quality, safe operation and service life of the set. Under the composite influence of various reasons such as mechanical, hydraulic and electromagnetic factor coupling and mechanical component ageing, the fault that the hydroelectric generating set produced is mostly expressed in the form of runout, so that the runout signal can intuitively represent the running state of the generating set.

The current monitoring system and the online monitoring system of the hydroelectric generating set monitor important indexes, and set alarm limit values for the indexes; however, in order to avoid false alarms, the set limit is high, and when the unit reaches the alarm limit, serious faults may have occurred. Even in a stable operation area, each monitoring index of the hydroelectric generating set is influenced by working conditions such as water head, exciting current and the like, and fluctuates up and down, and the real condition of the equipment state still cannot be reflected by the monitoring index change rate directly acquired and calculated. With the new technical innovation application of artificial intelligence, big data analysis and the like, trend analysis becomes possible by means of intelligent algorithms and technologies, and the transformation of the production mode of the hydropower plant from traditional manual monitoring and manual decision making into informationized, automatic and intelligent machine decision making is necessary.

Defects and deficiencies of the prior art:

1. at present, most hydroelectric generating sets are provided with a considerable amount of online monitoring systems, but no standardized operation, use and maintenance methods are formed, the application of the online monitoring systems is not important enough, and the acquired data lack of special personnel to carry out deep analysis and technical support of professional technicians.

2. At present, the state monitoring and early warning of the hydraulic generator adopts a mode of setting a fixed threshold value and calculating the change rate, and the problems of false alarm and untimely early warning exist.

3. Considering the coupling influence of various factors such as the running environment, local impact and the like, the hydroelectric generating set runout monitoring signal often presents complex non-stable and nonlinear characteristics, the development trend of the generating set runout signal is predicted by utilizing the existing method, and satisfactory prediction precision is difficult to obtain.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a multimode fused turbine runout analysis method which is used for analyzing and predicting turbine runout data and accurately alarming abnormal runout values in real time so as to achieve the aim of fault diagnosis auxiliary decision.

The technical purpose of the invention is realized by the following technical scheme: a multimode fusion turbine runout analysis method specifically comprises the following steps:

s1, acquiring historical unit runout related data, and preprocessing the unit runout related data;

s11, collecting historical online and offline monitoring data, removing data and error data which are irrelevant to runout, and preliminarily integrating a data set used for training;

s12, cleaning repeated data, zero value data and missing data in the data set, and resampling;

s2, further performing characteristic engineering on the preprocessed unit runout related data to obtain a training set;

s21, carrying out correlation analysis on data in the data set, carrying out correlation analysis on the characteristic attribute and the target attribute by using a Pearson correlation coefficient, and selecting a required characteristic value according to the correlation sequence;

s22, according to each column of features, solving the maximum value max and the minimum value min of each feature;

s23, if the min is more than or equal to 0, normalizing each column of data as follows:

wherein x is _i,j For the characteristic value of the ith column and jth row, x' _i,j For its normalized value, min _i Minimum value of ith column, max _i Is the maximum value of the ith column;

s24 if min <0, normalize each column of data as follows:

s25, segmenting the normalized data, dividing the data into a training set and a test set, wherein the proportion is 8:2, and dividing the characteristic value and the target value to obtain the training set;

s3, extracting data related to runout in the training set, and respectively inputting an SVR model, a LightGBM model and an XGBoost model for training;

s4, performing least square fitting on the results of the three models to obtain weight distribution of the three models, and forming a fusion model;

s5, inputting the on-line monitoring runout related data into a trained fusion model to obtain a predicted runout value and taking the predicted runout value as a standard value of runout under the working condition;

s6, comparing the collected runout data with a standard value, and marking abnormal data and abnormal grades.

Preferably, in step S6, if the amplitude is greater than or equal to 10 μm from the predicted value when the vibration value and the yaw rate value are smaller than 40 μm, it is determined that the secondary amplitude is abnormal; if the amplitude is larger than the predicted value by more than 20 mu m, judging that the first-order amplitude is abnormal.

Preferably, in step S6, when the vibration value and the swing value are greater than 40um, the amplitude is greater than 10% -25% of the predicted value, and it is determined that the secondary amplitude is abnormal; when the amplitude is greater than 25% of the predicted value, it is determined that the first-order amplitude is abnormal.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the multimode fusion water turbine runout analysis method provided by the invention, multidimensional influence factors are considered, the running condition of the water turbine is considered, a more accurate prediction result is obtained, and more reliable alarm output is provided.

2. According to the multimode fusion water turbine runout analysis method provided by the invention, three models are used for training respectively, the accuracy of a prediction model is ensured by an automatic weighting mode, and the accuracy and the scientificity of prediction are improved.

3. The multimode fused turbine runout analysis method provided by the invention can be integrated in an online monitoring system to predict data in real time, and overcomes the limitation that the current online monitoring device only adopts a runout value out-of-limit mode to perform early warning.

Drawings

FIG. 1 is a flow chart of one embodiment of the present invention.

FIG. 2 is a schematic diagram of SVR model support vector regression in accordance with one embodiment of the present invention.

FIG. 3 is a graph of the predicted outcome of a water-guided ferry X-direction model in one embodiment of the present invention.

FIG. 4 is a graph of predicted results of a water-guided ferry Y-direction model in accordance with one embodiment of the present invention.

FIG. 5 is a graph of the results of model predictions of the X-direction of vibration of the top cover in accordance with an embodiment of the present invention.

FIG. 6 is a graph of top cover vibration Y-direction model predictions in accordance with an embodiment of the present invention.

FIG. 7 is a graph of top cover vibration Z-direction model predictions in accordance with an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As a preferred embodiment of the present invention, the present embodiment provides a multimode fused turbine runout analysis method, referring to FIG. 1, specifically comprising the following steps:

s1, acquiring historical unit runout related data, including possibly related data such as water head, power, excitation, guide vane opening and the like, and preprocessing the unit runout related data;

s24 if min <0, normalize each column of data as follows:

s4, the robustness of the model is increased, least square fitting is conducted on the results of the three models, weight distribution of the three models is obtained, a fusion model is formed, the model of the predicted runout result is obtained, and the fusion model is formed through the weight distribution of the three models, so that the predicted data is close to the real condition, and the model capable of accurately predicting the runout result is obtained;

In the above embodiment, the SVR model is generally called support vector regression, i.e. support vector regression, which is an application of the SVM (support vector machine support vector machine) to regression problem. It fits the samples with a linear function in vector space. The model takes the integrated distance from the actual positions of all samples to the linear function as loss, and the parameters of the linear function are obtained by minimizing the loss.

LightGBM (Light Gradient Boosting Machine) is a distributed gradient promotion framework based on decision tree algorithm. In order to meet the requirement of shortening the model calculation time in the industry, the design idea of the LightGBM is mainly two points: the use of data to the memory is reduced, and the single machine can use more data as much as possible under the condition of not sacrificing the speed; the cost of communication is reduced, the efficiency of multi-machine parallel operation is improved, and the linear acceleration in calculation is realized.

The full name of XGBoost is eXtreme Gradient Boosting, which is an optimized distributed gradient promotion library, intended to be efficient, flexible and portable. XGBoost redefines the loss function and the weak evaluator based on the gradient lifting tree, improves the integration means of the lifting algorithm, and realizes the balance of the operation speed and the model effect.

The training set after the feature engineering is input into the SVR model LightGBM and the XGBoost model for training, so that the model finds out the relation between the related data of the vibration of the water head, the power, the excitation, the guide vane and the like and the vibration data, and the output model can predict the vibration value to a certain extent.

In the above embodiment, the sample data (x, y) is set, the model output value is denoted as f (x), and the true value is denoted as y. The conventional regression model takes the difference between f (x) and y as a loss value, and the model can determine the loss value as 0 only when f (x) is equal to f (x). The SVR can set a deviation value e, calculate the absolute value of the difference between f (x) and y, and calculate the loss when the absolute value is greater than e. Referring to fig. 3, a graphical representation is that the spacing bands of width e are set on each side of f (x), and the values falling between the two spacing bands are considered to be correct values.

The SVR differs from the conventional SVR in that it takes into account the relaxation variables xi, the penalty coefficients C, the insensitive loss function e in the derivation process. Through derivation, the functional form of SVR can be finally obtained:

where w represents a weight vector in a high-dimensional space, b is a threshold, phi (x _i ) Is a nonlinear function and epsilon is a set parameter of the insensitive loss function.

The function of gentle fitting can be achieved, and the popularization capability of the model is further improved; the punishment parameter C represents the control level of a sample point with the error exceeding a given value epsilon and mainly plays a role in balancing the model estimation degree and the complex degree, and the punishment factor is usually a positive number under the general condition; epsilon represents the requirement for regression model errors; the relaxation variables ζ, ζ introduced ^* And the upper and lower bounds of the output value are controlled.

When the solution in the sample space is impossible, a kernel function K (x _i ,x _j ) The solution at this time is as follows:

alpha in the formula _i And

are all determined coefficients, K (x) _i ,x _j )＝φ(x _i )φ(x _j ) Is a symmetric positive real function.

The LightGBM belongs to a boosting integrated learning method, and is an efficient implementation of a gradient lifting tree (gradient boosting decision tree, GBDT) algorithm framework. The GBDT algorithm is implemented by: input training set { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…(x _N ,y _N ) Initializing a classifier

Wherein h is ₀ (x) The first base learner selected by the user and the training targets of the T base learners are set, and the calculation method of each base learner is as follows:

1) Calculating the negative gradient my of the current loss function _i ：

2) Fitting the negative gradient to obtain the current base learner h _t Is defined by the parameters:

3) Minimizing the loss function yields the weight of the current basis learner:

final classifier F _t (x) I.e., a weighted sum for each base learner:

F _t (x)＝F _t-1 (x _i )+α _t h _t (x；w _t ) (4)

as can be seen from the calculation processes of the formulas (1) - (4), the GBDT algorithm needs to traverse the whole training data multiple times in each iteration, and if the whole training data is loaded into the memory, the size of the training data is limited, and if the whole training data is not loaded into the memory, the training data needs to be repeatedly read and written, so that a great amount of calculation time is consumed.

To solve this problem, the LightGBM makes optimization of feature histograms, single-sided gradient sampling, mutually exclusive feature bundling, and Leaf-wise growth strategies in the traditional GBDT algorithm. These optimizations allow the algorithm to have faster training speeds and lower memory consumption, so the LightGBM algorithm is more suitable for processing massive amounts of data, while the runout data has a huge amount of data.

XGBoost is an ensemble learning method that improves the performance of the model by iteratively adding weak learners to the training data. In each iteration, the XGBoost adds a new model into the original model, and fits the residual error between the predicted result and the real result of the previous model by using the new model so as to obtain a better predicted result. In terms of runout prediction, XGBoost uses a basic regression tree model, and the integrated model of the tree can be expressed as follows:

wherein: x is x _i Is the feature vector of the i-th input;

representing a predicted shimmy value for the ith sample; k represents the number of regression trees; r is the collection space of the regression tree; f (f) _k A function in the representation set R is the output of the base learner.

By accumulating the results of the iterative process, the objective function of XGBoost may be converted to the following:

wherein:

is the error between the predicted outcome and the true outcome, < >>

Is a regularization term of the objective function, Ω (f _k ) The expression of (2) is:

wherein: t is the number of leaf nodes; gamma is a penalty function coefficient for controlling the number of leaf nodes; omega _j Is the weight of the leaf node; lambda is the regularization penalty term coefficient. Finally, combining the iteration result of XGBoost and at f _k The optimal objective function value can be obtained by taylor second-order expansion at=0.

In least square fitting of the results of the three models, it is assumed that the SVR prediction result is y ₁ The LightGBM prediction result is y ₂ XGBoost prediction result is y ₃ Then the predicted value y' of the fused model meets the following convention:

y'＝βy ₁ +γy ₂ +λy ₃

wherein beta is the proportion of SVR model weight to combination weight, gamma is the proportion of LightGBM model weight to combination weight, lambda is the proportion of XGBoost model weight to combination weight, and the following conditions are satisfied:

β+γ+λ＝1

and optimally solving the comprehensive weight according to the following steps:

in a specific verification test, the water guiding swing degree and top cover vibration of the water turbine are selected as prediction targets, 49 relevant features are screened through expert experience, pelson coefficient correlation analysis is adopted, and 25 high relevant features are selected as feature sets. The historical data are split into training data and test data, the SVR model, the LightGBM model and the XGBoost model are respectively adopted for training and testing, further, in order to increase the robustness of the models, the three models are subjected to weight fusion by adopting a least square method, and according to the weight fusion distribution method, the fused weight distribution is as shown in the following table:

and selecting 100 points from the prediction results of all models for drawing, wherein the experimental results are shown in fig. 3-7.

It can be seen from fig. 3-7 that the predicted results of the four models on the test set are consistent with the trend of the true values, which indicates that the four methods have good effects on the prediction of runout. To further evaluate the model effect, three models were evaluated using a coefficient of determination R2 index, the R2 score reflects the ratio of all variations of the dependent variable that can be interpreted by the independent variable through a regression relationship, expressed as:

wherein y is _i Representing the actual observed value by

Mean value of true observations is expressed by +.>

Representing the predicted value, MSE is the mean square error and Var is the variance.

The R2 score ranges from 0,1, and when R2 is 1, the predicted value and the true value in the sample are completely equal, and no error exists, which means that the better the interpretation of the independent variable to the dependent variable in the regression analysis, the larger R2 generally means the better the model fitting effect [12], and the results are shown in the following table:

from the table, the R2 coefficients of the three methods on the water deflection degree and the top cover vibration are both larger than 0.95, which shows that the three models can effectively predict the vibration value, but by comparison, the R2 score can reach more than 0.98 through the fusion model after least square fusion, and the prediction accuracy is obviously improved.

When the least square fitting is carried out on the results of the three models, the weight distribution of the SVR model, the LightGBM model and the XGBoost model can be slightly different according to different feature sets and different training data and test data, and accurate and unique weight distribution is obtained according to the method aiming at the determined feature sets and the determined training data and test data, so that the method is a main reason for improving the prediction accuracy after the models are fused, and is a contribution of the method for improving the runout analysis and the prediction of the water turbine.

According to the embodiment, the multi-model fusion and the separate training are carried out, the fusion model for analyzing and predicting the runout of the water turbine is obtained by adopting an automatic weighting mode, the accuracy of the prediction model is ensured, the influence factors of multiple dimensions are considered, the running conditions of the water turbine are considered, including but not limited to the runout related data such as water head, power, excitation, guide vane opening and the like, more accurate prediction results can be obtained, more reliable alarm output is provided, real-time prediction of the data is realized, and the limitation that the current online monitoring device only adopts the runout value out-of-limit mode for early warning is overcome.

In some embodiments, based on the above embodiments, in step S6, if the amplitude is greater than the predicted value by more than 10 μm when the vibration value and the yaw value are smaller than 40 μm, it is determined that the secondary amplitude is abnormal; if the amplitude is larger than the predicted value by more than 20 mu m, judging that the first-order amplitude is abnormal.

In other embodiments, based on the above embodiments, in step S6, when the vibration value and the swing value are greater than 40um, the amplitude is greater than 10% -25% of the predicted value, and it is determined that the secondary amplitude is abnormal; when the amplitude is greater than 25% of the predicted value, it is determined that the first-order amplitude is abnormal.

The present invention is not limited in its scope to the examples given herein, and all prior art, including but not limited to prior patent documents, prior publications, etc., which do not contradict the scope of the present invention.

In addition, it should be noted that the combination of the technical features described in the present invention is not limited to the combination described in the claims or the combination described in the specific embodiments, and all the technical features described in the present invention may be freely combined or combined in any manner unless contradiction occurs between them.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. A multimode fusion turbine runout analysis method is characterized by comprising the following steps:

s24 if min <0, normalize each column of data as follows:

2. The method for analyzing the runout of a multi-model fusion water turbine according to claim 1, wherein in the step S6, if the vibration value and the runout value are smaller than 40 μm and the amplitude is larger than the predicted value by more than 10 μm, the second-level amplitude is judged to be abnormal; if the amplitude is larger than the predicted value by more than 20 mu m, judging that the first-order amplitude is abnormal.

3. The multimode fused turbine runout analysis method of claim 1, wherein the method comprises the following steps of: in the step S6, when the vibration value and the swing value are larger than 40um, judging that the secondary amplitude is abnormal when the amplitude is larger than 10% -25% of the predicted value; when the amplitude is greater than 25% of the predicted value, it is determined that the first-order amplitude is abnormal.