CN111507412A

CN111507412A - Voltage missing value filling method based on historical data auxiliary scene analysis

Info

Publication number: CN111507412A
Application number: CN202010311551.5A
Authority: CN
Inventors: 陈光宇; 叶永康; 郝思鹏; 吕干云; 李干; 黄良灿
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing Institute of Technology
Priority date: 2020-04-20
Filing date: 2020-04-20
Publication date: 2020-08-07
Anticipated expiration: 2040-04-20
Also published as: CN111507412B

Abstract

The invention discloses a voltage deficiency value filling method based on historical data auxiliary scene analysis, which comprises the following steps: s1, acquiring historical data of the power grid; s2, calculating the fluctuation cross-correlation coefficient of each known attribute data and the missing attribute data through a fluctuation cross-correlation analysis algorithm; s3, screening out attribute data with large fluctuation reciprocity; s4, calculating a combined weight; s5, performing scene analysis on the missing date and searching for similar scenes in the historical data of the power grid; s6, measuring the similarity of the data of the other attributes in the missing time period through the dynamic time bending distance in the similar scene; s7, calculating comprehensive similarity by combining the combined weight; and S8, finding out the date with the highest comprehensive similarity, and filling up the missing attribute data by combining the data at the same time on the date with the horizontal data. The method can fully utilize the historical data of voltage-related attributes to fill the voltage missing value, and improves the accuracy of the voltage filling value.

Description

Voltage missing value filling method based on historical data auxiliary scene analysis

Technical Field

The invention relates to a historical data assisted scene analysis-based voltage missing value filling method, and belongs to the voltage identification technology of a power system.

Background

Along with the continuous development of power grids, the scale of the power grids is increased year by year, in the field of regulation and control, the accuracy and the integrity of data are particularly important for power grid control, but along with the exponential increase of the collected data quantity, the problem of voltage data loss caused by manual input and faults of a collecting device occurs occasionally, so that the lost data needs to be identified or supplemented, the traditional maximum Expectation (EM) Algorithm, the K neighbor algorithms (KNN, K neighbor Neighbors) and other methods provide solutions, but as less historical data are used as analysis bases, the filling effect is not ideal. In recent years, the research enthusiasm of big data is raised in all countries in the world, the big data technology injects fresh blood for the development of the smart grid and obtains better effect, so that a voltage missing value filling method based on historical data auxiliary scene analysis is provided, the filling precision of the voltage missing value is further improved, and the development requirement of the power grid is met.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides a voltage deficiency value filling method based on historical data auxiliary scene analysis, so that the precision of filling data is improved, and the development requirement of a power grid is met.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:

a voltage missing value filling method based on historical data auxiliary scene analysis comprises the following steps:

s1, acquiring historical data of the power grid, and entering the step S2;

s2, calculating the fluctuation cross-correlation coefficient of each known attribute data and the missing attribute data at the same time through a fluctuation cross-correlation analysis algorithm, and entering the step S3;

s3, if the fluctuation cross-correlation coefficient of a certain known attribute data and the missing attribute data exceeds the comparison threshold, keeping the known attribute data, and entering the step S4; otherwise, abandoning the known attribute data;

s4, calling the attribute corresponding to the M reserved known attribute data as a Know attribute, calling the attribute corresponding to the missing attribute data as an Unknow attribute, and respectively calculating the combined weight of each Know attribute and the Unknow attribute;

s5, carrying out scene analysis on the dates containing the Unknow attributes, and searching H dates of the most similar scenes in the historical data of the power grid; the date containing the Unknow attribute is called a missing date, and the dates of the found H most similar scenes are called H similar dates;

s6, determining the time period of the missing attribute data in the missing date, and measuring the similarity of each Know attribute data of the missing date and each Know attribute data of each similar date through the dynamic time bending distance for the same time period of each similar date;

s7, combining the combined weight of each Know attribute and the Unknow attribute, and calculating Unk of each similar date_nowComprehensive similarity of attributes;

s8, finding Unk_nowAnd integrating the date with the highest attribute comprehensive similarity, and combining the data at the same time on the date with the horizontal data to fill up the missing attribute data.

The invention starts from the historical data of the power grid, fully utilizes the correlation among the attribute data in the power grid, selects the attribute data with stronger correlation as the reference basis for filling the missing attribute data, calculates the combined weight to further quantify the correlation degree of the attribute pieces, ensures higher utilization degree of the strongly correlated attribute data, simultaneously measures the similarity degree of the data of each attribute at the missing moment and the historical data through the dynamic time complete distance, and finds out the data at the moment most similar to the missing moment to replace the data at the missing moment by matching with the combined weight. The method and the device fully utilize the correlation between the missing attribute data and other attribute data to solve the problem of filling the missing attribute data, and improve the accuracy of filling the missing attribute data.

Specifically, in step S1, the historical data of the power grid is derived from voltage data detection, bus balance detection, constraint preprocessing, proportion anomaly detection, initial power flow precision detection, and the like, and the historical data of the power grid needs to be preprocessed, suspected error data is selected, and whether subsequent optimization calculation can be performed is determined.

Specifically, in step S2, the calculation process of the fluctuating cross-correlation coefficient is as follows:

s21, for two equal-length time series x_iAnd y_iWherein i ═ 1,2, …, N;

s22, calculating x_i、y_iSum of differences from the mean:

wherein: l represents a sampling length, and Δ x (l), Δ y (l) represent x_iAnd y_iThe sum of the differences from the mean value at the sample length l,

and

respectively represent x_iAnd y_iAverage value of (d);

s23, calculating to respectively represent x_i、y_iForward difference of autocorrelation:

Δx(l,l₀)＝x(l₀+l)-x(l₀),l₀＝1,2,…,N-l

Δy(l,l₀)＝y(l₀+l)-y(l₀),l₀＝1,2,…,N-l

wherein: 1,2, …, N-1 for each sampling periodl are all provided with₀N-l differences, Δ x (l, l)₀)、Δy(l,l₀) Respectively represent x_iAnd y_iForward difference of the autocorrelation of (a);

s24, calculating x_i、y_iThe covariance of (a):

wherein: cov_xy(l) Denotes x_iAnd y_iThe covariance of (a) of (b),

represents the average of;

s25, calculating x_i、y_iFluctuating cross-correlation coefficient of (a): if x_i、y_iWhen there is a certain correlation, Cov_xy(l) Satisfy power law distribution

Wherein: h is_xyDenotes x_iAnd y_iThe degree of correlation, i.e. the fluctuation correlation coefficient, is obtained by fitting a power law distribution to obtain a fluctuation correlation coefficient h_xy(ii) a When h is generated_xyWhen 0, x is represented_iAnd y_iNot related; when h is generated_xyWhen > 0, denotes x_iAnd y_iPositive correlation; when h is generated_xyWhen < 0, it represents x_iAnd y_iNegative correlation; h is_xyLarger value indicates x_iAnd y_iThe higher the degree of correlation.

Considering that the attribute data is more, in order to avoid the influence of the attribute data with lower correlation on the filling result of the missing attribute data, setting a comparison threshold value of the fluctuation cross-correlation coefficient, and if the fluctuation cross-correlation coefficient of the known attribute data and the missing attribute data is lower than the comparison threshold value, considering that the reference value of the known attribute data is lower or has no reference value and abandoning the known attribute data; after the threshold comparison determination, M attribute data remain, and the corresponding attributes are referred to as M Know attributes, which are numbered from 1 to M. Since the correlation between the attribute data and the missing attribute data is different, the reference value and the utilization value are different, and a combination weight of the missing attribute needs to be set to ensure sufficient and reasonable utilization of the historical data.

Specifically, the larger the fluctuating cross-correlation coefficient is, the stronger the correlation between the known attribute data and the missing attribute data is, and the higher reference value of the known attribute data should be when the missing attribute data is filled, so that the weight should be higher; in step S4, the combination weight w of the Know attribute j and the Unknow attribute_jCalculated by the following formula:

wherein: m denotes the number of Know attributes (i.e. the number of attributes corresponding to the retained known attribute data), j is 1,2, …, M, c_jAnd a fluctuation correlation coefficient representing the Know attribute j and the Unknow attribute (that is, the fluctuation correlation coefficient of the known attribute data corresponding to the Know attribute j and the unknown attribute data corresponding to the Unknow attribute).

Specifically, in step S5, the scene analysis is performed on the date containing the unknown attribute, which includes the following steps:

s51, carrying out scene classification on the historical data of the power grid according to the daily load condition; inputting a date containing an Unknow attribute and analyzing daily load conditions; considering that the historical data is huge in size and low in value density, if the historical data is traversed, the efficiency is low, and the effect is little; therefore, daily load condition analysis is carried out, namely, scenes are judged and classified into working days, general rest days and special festivals and holidays;

s52, judging whether the scene of the date is a holiday: if yes, the scene of the date is determined as a working day, and the step S54 is entered; otherwise, go to step S53;

s53, judging whether the scene of the date is a special holiday: if the date is the special holiday, the scene of the date is determined to be the special holiday, and the process goes to step S54; otherwise, the scene of the date is determined as a general holiday, and the step S54 is entered;

and S54, searching H most similar scene dates in the historical data of the power grid, namely searching H holidays, special holidays or general holidays.

Description of holidays for special festivals: the holidays of the festivals specified by other countries like the New year, the spring festival, the Qingming festival, the labor festival, the Dragon festival, the mid-autumn festival, the national festival and the like are special holidays.

Specifically, in step S6, the measuring the similarity between each piece of Know attribute data of the missing date and each piece of Know attribute data of each similar date by the dynamic time warping distance includes the following steps:

s61, because the dynamic time warping distance is used for measuring the similarity degree of two time series, and we lack data at a certain moment, the moment when the missing attribute data occurs is set as t_nAt time t_nSelecting n time points (i.e. t) from time to time_n+1,t_n+2,…,t_2n) At t_nSelecting n time points (i.e. t) from time to time_n-1,t_n-2,…,t₀) Finally, the time period (t) of the missing attribute data in the missing date is formed₀,t_2n) Contains t₀,t₁,t₂,…,t_2nA total of 2n +1 time points; setting M Know attributes retained after the judgment and screening of the comparison threshold value as A₁,A₂,…,A_MThe Unknow attribute is denoted as A₀；

S62, Know attribute A₁,A₂,…,A_MT in h-th similar period₀,t₁,t₂,…,t_2nThe attribute data of the time are respectively recorded as D_(1,h),D_(2,h),…D_(M,h)，

d_(j,h,g)Represents t of the Know attribute j in the h-th similar date_gAttribute data of time, j is 1,2, …, M, H is 1,2, …, H, g is 0,1,2, …,2 n;

s63, measuring Know attribute A through dynamic time bending distance_jT in h-th similar period₀,t₁,t₂,…,t_2nAttribute data D of time_(j,_h)And t in the deletion period₀,t₁,t₂,…,t_2nAttribute data D of time_(j,p)Similarity of (2)_(j,_h)And p represents the deletion date.

Specifically, in step S7, the overall similarity of the unknown attributes on each similar date is calculated by the following formula:

wherein: c_hRepresenting the integrated similarity of the Unknow attributes in the h-th similarity date.

Specifically, historical data of a certain attribute at the same time point every day is taken as a longitudinal historical data section of the attribute, and transverse historical data is obtained by dividing the data at the same time according to the attribute; according to the missing attribute data filling strategy, longitudinal historical data are fully utilized, and the missing attribute data are not only related to the longitudinal historical data, but also related to transverse historical data, so that a missing attribute data filling value obtained by combining the longitudinal historical data and the transverse historical data is closer to a true value; in step S8, after the date with the highest comprehensive similarity of the Unknow attributes is found, the Unknow attributes are extracted at the date t_nData of time T₁As vertical padding data; meanwhile, linear fitting of a curve is adopted for the Unknow attribute to find out the date t_nData of time T₂As the horizontal padding data, the final padding value for solving the missing attribute data is:

T＝α×T₁+β×T₂

α+β＝1

wherein: t is t_nThe time is the occurrence time of the missing attribute data, α is T₁β is T₂The weight of (c).

Has the advantages that: the voltage missing value filling method based on historical data auxiliary scene analysis can fully utilize the historical data of a power grid and improve the accuracy of missing attribute data filling; the invention establishes the relation between attributes through a fluctuation cross-correlation analysis algorithm, quantifies the correlation degree by introducing a combined weight, measures the similarity degree of the missing moment data and the historical data through a dynamic time bending distance, and finally selects the data at the most similar moment to replace the missing data to complete the filling of the missing data.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic flow diagram of a wave cross-correlation algorithm;

FIG. 3 is a schematic flow chart of a scene analysis process;

FIG. 4 is a schematic flow chart of similarity calculation;

FIG. 5 is a schematic flow chart of the integrated similarity calculation;

FIG. 6 is a schematic diagram of a missing value padding process;

FIG. 7 is a comparison graph of filling accuracy for different algorithms;

FIG. 8 is a comparison graph of the filling-up results and the true values of the algorithm proposed by the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1 to 6, a method for filling a voltage missing value based on historical data assisted scene analysis includes the following steps:

s1, historical data of the power grid are obtained, and the process goes to step S2.

S2, the fluctuation cross-correlation coefficient of each known attribute data and the missing attribute data at the same time is calculated by a fluctuation cross-correlation analysis algorithm, and the process advances to step S3.

As shown in fig. 2, the fluctuating cross-correlation coefficient is calculated as follows:

s21, for two equal-length time series x_iAnd y_iWherein i ═ 1,2, …, N;

s22, calculating x_i、y_iSum of differences from the mean:

and

respectively represent x_iAnd y_iAverage value of (d);

Δx(l,l₀)＝x(l₀+l)-x(l₀),l₀＝1,2,…,N-l

Δy(l,l₀)＝y(l₀+l)-y(l₀),l₀＝1,2,…,N-l

wherein: 1,2, …, N-1, l for each sampling period l₀N-l differences, Δ x (l, l)₀)、Δy(l,l₀) Respectively represent x_iAnd y_iForward difference of the autocorrelation of (a);

s24, calculating x_i、y_iThe covariance of (a):

wherein: cov_xy(l) Denotes x_iAnd y_iThe covariance of (a) of (b),

represents the average of;

S3, if the fluctuation cross-correlation coefficient of a certain known attribute data and the missing attribute data exceeds the comparison threshold, keeping the known attribute data, and entering the step S4; otherwise, the known attribute data is discarded.

Considering that the attribute data is more, in order to avoid the influence of the attribute data with lower correlation on the filling result of the missing attribute data, setting a comparison threshold value of the fluctuation cross-correlation coefficient, and if the fluctuation cross-correlation coefficient of the known attribute data and the missing attribute data is lower than the comparison threshold value, considering that the reference value of the known attribute data is lower or has no reference value and abandoning the known attribute data; after the threshold comparison determination, M attribute data remain, and the corresponding attributes are referred to as M Know attributes, which are numbered from 1 to M.

And S4, respectively calculating the combination weight of each Know attribute and the Unknow attribute, wherein the attribute corresponding to the reserved M known attribute data is called the Know attribute, the attribute corresponding to the missing attribute data is called the Unknow attribute, and the combination weight is calculated by the combination weight.

Combination weight w of Know attribute j and Unknow attribute_jCalculated by the following formula:

S5, carrying out scene analysis on the dates containing the Unknow attributes, and searching H dates of the most similar scenes in the historical data of the power grid; the date containing the property of Unknow is called the missing date, and the dates of the found H most similar scenes are called H similar dates.

As shown in fig. 3, the scene analysis includes the following steps:

Description of the drawings: the dates from Monday to Friday or other holiday-mediated rest are working days; the common saturday is the common rest day; the holidays of the festivals specified by other countries like the New year, the spring festival, the Qingming festival, the labor festival, the Dragon festival, the mid-autumn festival, the national festival and the like are special holidays.

And S6, determining the time period of the missing attribute data in the missing date, and measuring the similarity of each Know attribute data of the missing date and each Know attribute data of each similar date through the dynamic time bending distance for the same time period of each similar date.

As shown in fig. 4, the similarity calculation includes the steps of:

S62, Know attribute A₁,A₂,…,A_MT in h-th similar period₀,t₁,t₂,…,t_2nAttributes of time of dayData are respectively marked as D_(1,h),D_(2,h),…D_(M,h)，

d_(jhg)Represents t of the Know attribute j in the h-th similar date_gAttribute data of time, j is 1,2, …, M, H is 1,2, …, H, g is 0,1,2, …,2 n;

And S7, calculating the comprehensive similarity of the unknown attributes of the similar dates by combining the combined weight of the unknown attributes and the unknown attributes.

As shown in fig. 5, the overall similarity of the Unknow attributes for each similar date is calculated by the following formula:

And S8, finding out the date with the highest comprehensive similarity of the Unknow attributes, and filling up the missing attribute data by combining the data at the same time on the date with the horizontal data.

As shown in fig. 6, the filling process of missing data by using horizontal and vertical data includes the following steps:

s81, inputting power grid data;

s82, carrying out data type division on the power grid data, wherein historical longitudinal data form a historical longitudinal database, and historical transverse data form a historical transverse database;

s83, extracting Unkn after finding out the date with highest comprehensive similarity of Unknow attributes for the historical longitudinal databaseThe ow attribute is at the date t_nData of time T₁Selecting an appropriate weight ratio α as longitudinal filling data;

s84, for the transverse historical database, finding the date t by adopting linear fitting of a curve to the Unknow attribute_nData of time T₂As the transverse filling data, selecting an appropriate weight ratio β;

s85, solving the final filling value of the missing attribute data as follows:

T＝α×T₁+β×T₂

α+β＝1

The method is applied to the filling analysis of the voltage value missing condition of the power grid in a certain area, historical data of a real power grid, which is about 1 and a half years old, is selected as a historical data set, the sampling period is 5 minutes, the data filling object is the voltage missing value of a 10kV bus, the fluctuation cross-correlation coefficient is calculated for the related attributes of the voltage missing data, and the finally obtained related attributes are as follows: { reactive load, active load, current value }. In order to embody the advantages of the Algorithm provided by the invention, a traditional maximum Expectation (EM) Algorithm and a K Nearest Neighbors (KNN) Algorithm are selected for comparative analysis.

In order to fully detect the effectiveness of the method provided by the invention, a random deletion strategy is adopted to delete 1%, 5%, 10%, 15%, 20%, 25% and 30% of data in the data set. And evaluating a filling result by adopting filling accuracy under the condition of different voltage deficiency degrees, wherein the evaluation method of the filling accuracy comprises the following steps:

wherein: n is_rN is the number of voltage loss values to estimate the correct number. In order to ensure the reliability of the experimental result, 5 times of calculation is carried out under the condition of different voltage loss rates, and the average value of the 5 times of calculation is used as the final experimental result.The experimental result is shown in fig. 7, and it can be seen that the filling accuracy of the method provided by the invention is obviously better than that of the conventional algorithm. To further demonstrate the effect of the method of the present invention, the analysis was performed by taking the case of a deletion rate of 15% as an example. Fig. 8 shows comparison analysis of 27 consecutive groups of voltage data in a certain missing condition, and it is obvious from the results in the figure that the curve drawn by the method provided by the present invention has good fitting degree with the true value curve, the filling result is close to the true value, and the filling effect is good.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A voltage missing value filling method based on historical data auxiliary scene analysis is characterized by comprising the following steps: the method comprises the following steps:

s1, acquiring historical data of the power grid, and entering the step S2;

s7, calculating the comprehensive similarity of the unknown attributes on the similar dates by combining the combined weight of the unknown attributes and the unknown attributes;

2. The voltage missing value filling method based on historical data auxiliary scene analysis according to claim 1, wherein: in step S2, the fluctuation cross-correlation coefficient is calculated as follows:

s21, for two equal-length time series x_iAnd y_iWherein i ═ 1,2, …, N;

s22, calculating x_i、y_iSum of differences from the mean:

and

respectively represent x_iAnd y_iAverage value of (d);

Δx(l,l₀)＝x(l₀+l)-x(l₀),l₀＝1,2,…,N-l

Δy(l,l₀)＝y(l₀+l)-y(l₀),l₀＝1,2,…,N-l

s24, calculating x_i、y_iThe covariance of (a):

wherein: cov_xy(l) Denotes x_iAnd y_iThe covariance of (a) of (b),

represents the average of;

3. The voltage missing value filling method based on historical data auxiliary scene analysis according to claim 1, wherein: in step S4, the combination weight w of the Know attribute j and the Unknow attribute_jCalculated by the following formula:

wherein: m represents the number j of Know attributes 1,2, …, M, c_jAnd expressing the fluctuation correlation coefficient of the Know attribute j and the Unknow attribute.

4. The voltage missing value filling method based on historical data auxiliary scene analysis according to claim 1, wherein: in step S5, the scene analysis is performed on the date containing the unknown attribute, which includes the following steps:

s51, carrying out scene classification on the historical data of the power grid according to the daily load condition; inputting a date containing an Unknow attribute and analyzing daily load conditions;

5. The voltage missing value filling method based on historical data auxiliary scene analysis according to claim 1, wherein: in step S6, the method for measuring the similarity between each piece of Know attribute data of the missing date and each piece of Know attribute data of each similar date by using the dynamic time warping distance includes the following steps:

s61, setting the time of the missing attribute data as t_nAt time t_nSelecting n time points backwards in time, and at t_nSelecting n time points from the moment forward, and finally forming a time period (t) of the missing attribute data in the missing date₀,t_2n) Contains t₀,t₁,t₂,…,t_2nA total of 2n +1 time points; setting M Know attributes retained after the judgment and screening of the comparison threshold value as A₁,A₂,…,A_MThe Unknow attribute is denoted as A₀；

s63, measuring Know attribute A through dynamic time bending distance_jT in h-th similar period₀,t₁,t₂,…,t_2nAttribute data D of time_(j,_h)And t in the deletion period₀,t₁,t₂,…,t_2nAttribute data D of time_(j,p)Similarity of (2)_(j,h)And p represents the deletion date.

6. The voltage missing value filling method based on historical data auxiliary scene analysis according to claim 1, wherein: in step S7, the overall similarity of the unknown attribute at each similar date is calculated by the following formula:

7. The voltage missing value filling method based on historical data auxiliary scene analysis according to claim 1, wherein: taking the historical data of a certain attribute at the same time point every day as a longitudinal historical data section of the attribute, wherein the transverse historical data is obtained by dividing the data at the same time according to the attribute; in step S8, after the date with the highest comprehensive similarity of the Unknow attributes is found, the Unknow attributes are extracted at the date t_nData of time T₁As vertical padding data; meanwhile, linear fitting of a curve is adopted for the Unknow attribute to find out the date t_nData of time T₂As the horizontal padding data, the final padding value for solving the missing attribute data is:

T＝α×T₁+β×T₂

α+β＝1