CN106021452A - Electromagnetic environment measurement data cleaning method - Google Patents
Electromagnetic environment measurement data cleaning method Download PDFInfo
- Publication number
- CN106021452A CN106021452A CN201610325629.2A CN201610325629A CN106021452A CN 106021452 A CN106021452 A CN 106021452A CN 201610325629 A CN201610325629 A CN 201610325629A CN 106021452 A CN106021452 A CN 106021452A
- Authority
- CN
- China
- Prior art keywords
- data
- electromagnetic environment
- subset
- parameter
- environment measurement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000005259 measurement Methods 0.000 title claims abstract description 26
- 238000004140 cleaning Methods 0.000 title claims abstract description 12
- 230000005540 biological transmission Effects 0.000 claims abstract description 13
- 230000005611 electricity Effects 0.000 claims description 6
- 238000013480 data collection Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 4
- 238000013459 approach Methods 0.000 claims description 3
- 239000012141 concentrate Substances 0.000 claims 1
- 230000007613 environmental effect Effects 0.000 abstract description 5
- 238000005457 optimization Methods 0.000 abstract description 2
- 238000011156 evaluation Methods 0.000 abstract 1
- 230000009466 transformation Effects 0.000 abstract 1
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005201 scrubbing Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- BDAGIHXWWSANSR-UHFFFAOYSA-M Formate Chemical compound [O-]C=O BDAGIHXWWSANSR-UHFFFAOYSA-M 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Testing Electric Properties And Detecting Electric Faults (AREA)
Abstract
The invention relates to a method for cleaning electromagnetic environment measurement data, and belongs to the technical field of electromagnetic environment protection of an electric power system. The method aims at an original data set consisting of electromagnetic environment parameters in electromagnetic environment measurement and meteorological parameters of the environment where the power transmission line is located, and data cleaning is carried out on the measured data by utilizing a clustering method according to the correlation between the electromagnetic environment and the meteorological parameters. By the method, bad data generated in electromagnetic environment measurement and error data generated by environmental interference, equipment failure and the like can be identified and correspondingly processed, and more effective electromagnetic environment measurement data can be finally formed. The method can conveniently, effectively and reliably clean the data of the electromagnetic environment measurement data, and can provide better original data for the processes of subsequent power transmission and transformation project electromagnetic environment evaluation, optimization and the like.
Description
Technical field
The invention belongs to Power System Electromagnetic Environment guard technology field, particularly in order to electromagnetic environment measurement data to be carried out
The processing method of data cleansing.
Background technology
Electromagnetic environment is the major consideration of high voltage power transmission and transforming system design, and the root of the electromagnetic environment of transmission line of electricity is circuit
Corona discharge.On the one hand corona discharge can cause electric energy loss, increases Transmission Cost;On the other hand wire periphery electricity is affected
Magnetic environment, disturbs the orthobiosis of people further.Along with economic development and the enhancing of common people's environmental consciousness, electromagnetism
Environmental problem is the most noticeable, and its electromagnetic environment problem of ultra-high-tension power transmission line has become the design of its system and the main system run
About factor.The electromagnetic environment parameter of ultra-high-tension power transmission line mainly include audible noise, radio interference, ground formate field intensity,
Ground ion flow density and corona loss.The measurement of transmission line of electricity electromagnetic environment is for the analysis of follow-up electromagnetic environmental impact factor
And the operating energy loss of the defense controls of electromagnetic environment, line parameter circuit value suffers from important meaning.
Electromagnetic environment data parameters is numerous, simultaneously because electromagnetic environment parameter and the randomness of meteorologic parameter, for electromagnetism ring
The data cleansing of border measurement data just has indispensable effect.The purpose of data cleansing is to find out what those environment produced
Bad data that noise or equipment and other measurement problems bring also carries out respective handling.Data after cleaning can more preferably be used
In follow-up data analysis.And present stage does not also have the clear and definite Data Cleaning Method for electromagnetic environment measurement data: for
The electromagnetic environment data that pointwise test obtains, the most with good grounds device threshold, the factors such as custom value is interval of parameter carry out rejecting
Method;Multiple spot is tested simultaneously to the electromagnetic environment data obtained, in addition to the above methods can also be according to electromagnetic environment data
Cross direction profiles characteristic carries out data scrubbing.But generally speaking, present stage electromagnetic environment data base method for cleaning the most more dependence people
Work judges, not data scrubbing principle the most accurately.
Cluster analysis is that a kind of data split effectively method during Data Management Analysis.Its main purpose is basis
The similarity of data divides large data set in groups;The Sub Data Set so divided has data height in same data set
Similar, that the data height of different pieces of information collection is different feature.The most existing substantial amounts of clustering algorithm proposes, and generally can divide
Be four classes: division methods, hierarchical method, based on density method, based on grid method.The most classical division methods
For k mean algorithm, k mean algorithm, with k for input parameter, is divided into k subset the set of n object so that phase
High with the similarity in subset, and different subset similarity is low.
Summary of the invention
It is an object of the invention to the weak point for overcoming prior art, propose a kind of electromagnetic environment measurement data cleaning method;
This method can convenient, effectively and reliably carry out the data cleansing of electromagnetic environment measurement data, improve raw data base has
Effect property, provides the most rational data support for follow-up data analysis.
The present invention proposes a kind of electromagnetic environment measurement data cleaning method, a kind of electromagnetic environment ginseng in measuring for electromagnetic environment
The raw data set of the meteorologic parameter composition of number and circuit local environment.The method specifically includes following steps:
1) any one electromagnetic environment parameter measurement obtained forms initial data with the meteorologic parameter in transmission route survey place
Collection, note raw data set line number is n, columns is m;Each column data of raw data set represents a kind of parameter, is designated as aj,
J=1,2,3 ... m;Every data line represents the value of each parameter of synchronization, is a data point, is designated as xi, i=1,
2、3…n;Wherein aj、xiBeing one-dimensional vector, m, n, i, j are positive integer;
2) initial data is concentrated each column data ajIt is normalized: note ajMeansigma methods be μ, standard deviation is σ, returns
Column data after one changeJ=1,2,3 ... m;Use ajjReplace each column data a that initial data is concentratedj, former
Every data line x in beginning data setiIt is changed to new data point therewith, is designated as xii, ii=1,2,3 ... n;Wherein ajj、xiiAll
For one-dimensional vector, ii, jj are positive integer;
3) to step 2) in raw data set be normalized after the new data set that obtains cluster, make clusters number Obtain K subset, be designated as D1,D2,…,DK, and the cluster centre of each subset;Wherein round (x)
For x is taken the value that rounds up;
4) calculation procedure 3) each subset D of obtainingiInterior data point xiiWith this subset DiThe distance of cluster centre, obtain
All data points and the range data collection Dis of cluster centre in this subseti;
5) to step 4) each range data collection Dis of obtainingiCarry out normal approach, according to 3 σ criterions, by outside 3 σ intervals
Data point is identified as bad data point, and uses each subset DiInterior cluster centre replaces these bad data points;
6) make clusters number K reduce 1, return to step 3);
7) step 3 is repeated) to step 6) 5 times or after not having new bad data point to occur, complete data cleansing.
The feature of the present invention and beneficial effect:
The method, according to the dependency of electromagnetic environment with meteorologic parameter, utilizes clustering method that measurement data is carried out data cleansing.
Use the method can realize the mistake that the bad data produced in electromagnetic environment measurement and environmental disturbances, equipment fault etc. are produced
Data are identified and carry out respective handling by mistake, ultimately form significantly more efficient electromagnetic environment measurement data.Class of the present invention is convenient,
Effectively and reliably carry out the data cleansing of electromagnetic environment measurement data, can be follow-up electromagnetic environment assessment,
The processes such as optimization provide more preferable initial data.
Accompanying drawing explanation
Fig. 1 is the FB(flow block) of the inventive method.
Detailed description of the invention
A kind of electromagnetic environment measurement data cleaning method that the present invention proposes, below in conjunction with the accompanying drawings with specific embodiment the most specifically
Bright as follows.
A kind of electromagnetic environment measurement data cleaning method that the present invention proposes, a kind of electromagnetism ring in measuring for electromagnetic environment
The raw data set of the meteorologic parameter composition of border parameter and transmission line of electricity local environment, as it is shown in figure 1, specifically include following
Step:
1) any one electromagnetic environment parameter of measurement being obtained (includes audible noise, radio interference isoparametric wherein
Kind) and meteorologic parameter (including wind speed, temperature, atmospheric pressure) the composition raw data set in transmission route survey place, note
Raw data set line number is n, columns is m;Each column data of raw data set represents a kind of parameter, is designated as aj, j=1,
2、3…m;Every data line represents the value of each parameter of synchronization, is a data point, is designated as xi, i=1,2,
3…n;Wherein aj、xiBeing one-dimensional vector, m, n, i, j are positive integer;
2) initial data is concentrated each column data ajIt is normalized: note ajMeansigma methods be μ, standard deviation is σ, returns
Column data after one changeJ=1,2,3 ... m;Use ajjReplace each column data a that initial data is concentratedj, former
Every data line x in beginning data setiIt is changed to new data point therewith and is designated as xii, ii=1,2,3 ... n;Wherein ajj、xiiIt is
One-dimensional vector, ii, jj are positive integer;
3) to step 2) in raw data set be normalized after the new data set that obtains cluster, make clusters number Data set is clustered, obtains K subset, be designated as D1,D2,…,DK, and each subset is poly-
Class center;Wherein round (x) is for take, to x, the value that rounds up;The present embodiment uses k Mean Method cluster;
4) calculation procedure 3) each subset D of obtainingiData point x in data classiiWith this subset DiThe cluster centre of class
Distance, distance metric generally uses Euclidean distance, all data points and the range data of cluster centre in obtaining this subset
Collection Disi;
5) to step 4) each range data collection Dis of obtainingiCarry out normal approach, according to 3 σ criterions, by outside 3 σ intervals
Data point is identified as bad data point, and uses each subset DiInterior cluster centre replaces these bad data points;
6) make clusters number K reduce 1, return to step 3;
7) step 3 is repeated) to step 6) 5 times or after not having new bad data point to occur, complete data cleansing.
Above step 3) to step 6) calculating process process in software in general data calculating and all can complete, the present embodiment
Select Matlab as software for calculation.
Claims (1)
1. an electromagnetic environment measurement data cleaning method, it is characterised in that the method is for the electricity in electromagnetic environment measurement
The raw data set of the meteorologic parameter composition of magnetic environment parameter and transmission line of electricity local environment, specifically includes following steps:
1) any one electromagnetic environment parameter measurement obtained forms initial data with the meteorologic parameter in transmission route survey place
Collection, note raw data set line number is n, columns is m;Each column data of raw data set represents a kind of parameter, is designated as aj,
J=1,2,3 ... m;Every data line represents the value of each parameter of synchronization, is a data point, is designated as xi, i=1,
2、3…n;Wherein aj、xiBeing one-dimensional vector, m, n, i, j are positive integer;
2) initial data is concentrated each column data ajIt is normalized: note ajMeansigma methods be μ, standard deviation is σ, returns
Column data after one changeJ=1,2,3 ... m;Use ajjReplace each column data a that initial data is concentratedj, this
Time initial data concentrate each row data xiIt is changed to new data point therewith, is designated as xii, ii=1,2,3 ... n;Wherein ajj、
xiiBeing one-dimensional vector, ii, jj are positive integer;
3) to step 2) in raw data set be normalized after the new data set that obtains cluster, make clusters number Obtain K subset, be designated as D1,D2,…,DK, and the cluster centre of each subset;Wherein round (x)
For x is taken the value that rounds up;
4) calculation procedure 3) each subset D of obtainingiInterior data point xiiWith this subset DiThe distance of cluster centre, be somebody's turn to do
All data points and the range data collection Dis of cluster centre in subseti;
5) to step 4) each range data collection Dis of obtainingiCarry out normal approach, according to 3 σ criterions, by outside 3 σ intervals
Data point is identified as bad data point, and uses each subset DiInterior cluster centre replaces these bad data points;
6) make clusters number K reduce 1, return to step 3);
7) step 3 is repeated) to step 6) 5 times or after not having new bad data point to occur, complete data cleansing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610325629.2A CN106021452A (en) | 2016-05-16 | 2016-05-16 | Electromagnetic environment measurement data cleaning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610325629.2A CN106021452A (en) | 2016-05-16 | 2016-05-16 | Electromagnetic environment measurement data cleaning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106021452A true CN106021452A (en) | 2016-10-12 |
Family
ID=57098388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610325629.2A Pending CN106021452A (en) | 2016-05-16 | 2016-05-16 | Electromagnetic environment measurement data cleaning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106021452A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684308A (en) * | 2017-10-18 | 2019-04-26 | 南方电网科学研究院有限责任公司 | Electromagnetic environment parameter consistency cleaning method and device based on pattern search |
CN109684320A (en) * | 2018-12-25 | 2019-04-26 | 清华大学 | The method and apparatus of monitoring data on-line cleaning |
CN110472801A (en) * | 2019-08-26 | 2019-11-19 | 南方电网科学研究院有限责任公司 | DC power transmission line electromagnetic environment appraisal procedure and system |
CN110866074A (en) * | 2019-07-02 | 2020-03-06 | 黑龙江省电工仪器仪表工程技术研究中心有限公司 | Electric energy meter improved K-means classification method based on regional characteristics |
CN112783883A (en) * | 2021-01-22 | 2021-05-11 | 广东电网有限责任公司东莞供电局 | Power data standardized cleaning method and device under multi-source data access |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831431A (en) * | 2012-02-05 | 2012-12-19 | 四川大学 | Detector training method based on hierarchical clustering |
CN104462819A (en) * | 2014-12-09 | 2015-03-25 | 国网四川省电力公司信息通信公司 | Local outlier detection method based on density clustering |
-
2016
- 2016-05-16 CN CN201610325629.2A patent/CN106021452A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831431A (en) * | 2012-02-05 | 2012-12-19 | 四川大学 | Detector training method based on hierarchical clustering |
CN104462819A (en) * | 2014-12-09 | 2015-03-25 | 国网四川省电力公司信息通信公司 | Local outlier detection method based on density clustering |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684308A (en) * | 2017-10-18 | 2019-04-26 | 南方电网科学研究院有限责任公司 | Electromagnetic environment parameter consistency cleaning method and device based on pattern search |
CN109684308B (en) * | 2017-10-18 | 2020-11-17 | 南方电网科学研究院有限责任公司 | Electromagnetic environment parameter consistency cleaning method and device based on pattern search |
CN109684320A (en) * | 2018-12-25 | 2019-04-26 | 清华大学 | The method and apparatus of monitoring data on-line cleaning |
CN109684320B (en) * | 2018-12-25 | 2020-09-15 | 清华大学 | Method and equipment for online cleaning of monitoring data |
CN110866074A (en) * | 2019-07-02 | 2020-03-06 | 黑龙江省电工仪器仪表工程技术研究中心有限公司 | Electric energy meter improved K-means classification method based on regional characteristics |
CN110866074B (en) * | 2019-07-02 | 2022-11-04 | 黑龙江省电工仪器仪表工程技术研究中心有限公司 | Electric energy meter improved K-means classification method based on regional characteristics |
CN110472801A (en) * | 2019-08-26 | 2019-11-19 | 南方电网科学研究院有限责任公司 | DC power transmission line electromagnetic environment appraisal procedure and system |
CN112783883A (en) * | 2021-01-22 | 2021-05-11 | 广东电网有限责任公司东莞供电局 | Power data standardized cleaning method and device under multi-source data access |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cai et al. | Classification of power quality disturbances using Wigner-Ville distribution and deep convolutional neural networks | |
CN103076547B (en) | Method for identifying GIS (Gas Insulated Switchgear) local discharge fault type mode based on support vector machines | |
Mahela et al. | Recognition of power quality disturbances using S-transform based ruled decision tree and fuzzy C-means clustering classifiers | |
CN106021452A (en) | Electromagnetic environment measurement data cleaning method | |
Liu et al. | High-precision identification of power quality disturbances under strong noise environment based on FastICA and random forest | |
CN107589341B (en) | Single-phase grounding online fault positioning method based on distribution automation main station | |
CN108520023A (en) | A kind of identification of thunderstorm core and method for tracing based on Hybrid Clustering Algorithm | |
CN106651031B (en) | Lightning stroke flashover method for early warning and system based on historical information | |
Zaro et al. | Power quality detection and classification using S-transform and rule-based decision tree | |
Broderick et al. | Clustering methodology for classifying distribution feeders | |
CN105447502A (en) | Transient power disturbance identification method based on S conversion and improved SVM algorithm | |
CN109061774A (en) | A kind of thunderstorm core relevance processing method | |
CN109470985A (en) | A kind of voltage sag source identification methods based on more resolution singular value decompositions | |
You et al. | Research of an improved wavelet threshold denoising method for transformer partial discharge signal | |
CN111999591B (en) | Method for identifying abnormal state of primary equipment of power distribution network | |
CN103337248A (en) | Airport noise event recognition method based on time series kernel clustering | |
Mahela et al. | Recognition of power quality disturbances using discrete wavelet transform and fuzzy C-means clustering | |
CN110245692A (en) | A kind of hierarchy clustering method for Ensemble Numerical Weather Prediction member | |
Alshahrani et al. | Detection and classification of power quality disturbances based on Hilbert-Huang transform and feed forward neural networks | |
CN104182511B (en) | A kind of fuzzy distribution clustering method that compacts of cluster characteristic weighing | |
CN106707118B (en) | Partial Discharge Pattern Recognition Method and device | |
CN109214402A (en) | A kind of people having the same aspiration and interest unit grouping method of combination WAVELET FUZZY entropy and GG fuzzy clustering | |
CN104835174A (en) | Robustness model fitting method based on supermap mode search | |
Du et al. | Feature Selection-Based Low Voltage AC Arc Fault Diagnosis Method | |
Klinginsmith et al. | Unsupervised clustering on pmu data for event characterization on smart grid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161012 |