CN113378335A - Water supply network pressure prediction method and system based on machine learning - Google Patents
Water supply network pressure prediction method and system based on machine learning Download PDFInfo
- Publication number
- CN113378335A CN113378335A CN202110497008.3A CN202110497008A CN113378335A CN 113378335 A CN113378335 A CN 113378335A CN 202110497008 A CN202110497008 A CN 202110497008A CN 113378335 A CN113378335 A CN 113378335A
- Authority
- CN
- China
- Prior art keywords
- data
- pressure
- water supply
- historical
- supply network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 title claims abstract description 123
- 238000000034 method Methods 0.000 title claims abstract description 105
- 238000010801 machine learning Methods 0.000 title claims abstract description 31
- 238000007781 pre-processing Methods 0.000 claims abstract description 45
- 238000012544 monitoring process Methods 0.000 claims abstract description 34
- 238000004458 analytical method Methods 0.000 claims abstract description 33
- 238000009826 distribution Methods 0.000 claims description 32
- 238000012549 training Methods 0.000 claims description 30
- 238000003066 decision tree Methods 0.000 claims description 23
- 238000004140 cleaning Methods 0.000 claims description 18
- 230000001419 dependent effect Effects 0.000 claims description 18
- 238000010276 construction Methods 0.000 claims description 17
- 230000000737 periodic effect Effects 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 16
- 230000001502 supplementing effect Effects 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 13
- 238000011426 transformation method Methods 0.000 claims description 13
- 230000008859 change Effects 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 230000009467 reduction Effects 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 2
- 230000004044 response Effects 0.000 abstract description 5
- 238000004590 computer program Methods 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 230000007812 deficiency Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000005065 mining Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 230000010485 coping Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012916 structural analysis Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005312 nonlinear dynamic Methods 0.000 description 1
- 239000008399 tap water Substances 0.000 description 1
- 235000020679 tap water Nutrition 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/18—Network design, e.g. design based on topological or interconnect aspects of utility systems, piping, heating ventilation air conditioning [HVAC] or cabling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/02—CAD in a network environment, e.g. collaborative CAD or distributed simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/14—Pipes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/14—Force analysis or force optimisation, e.g. static or dynamic forces
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a water supply network pressure prediction method and a system based on machine learning, wherein the method comprises the following steps: acquiring historical pressure data; carrying out data preprocessing on the historical pressure data to obtain sample pressure data; constructing a water supply network pressure prediction model based on the sample pressure data; and performing pressure prediction analysis by adopting a water supply network pressure prediction model. The invention constructs a high-precision and high-efficiency water supply network pressure prediction model through machine learning based on historical pressure data, and is used for accurately predicting the pressure conditions of specific time points and specific monitoring points, so that a response plan can be prepared in advance to cope with the possible special conditions, stable water supply is ensured, and the pressure prediction requirement on a water supply network system is effectively met.
Description
Technical Field
The invention relates to the technical field of water supply network pressure management and machine learning, in particular to a water supply network pressure prediction method and system based on machine learning.
Background
The stability and the safety of the water supply network have important significance for the daily life of residents, industrial production and the development of cities and towns, and the urban water supply network has the characteristics of complex structure, large scale, strong randomness of water consumption and the like. When the water supply network operates, because the water demand in each region is inconsistent, in order to ensure that each region can obtain stable water supply in the water consumption peak period, and carry out daily operation and maintenance monitoring to the water supply network, the scheduling personnel need monitor its operation conditions through the water supply pressure of water supply network, and carry out predictive analysis to the water supply pressure of monitoring point based on the water supply pressure measured data and the professional experience of monitoring point, and then carry out water supply pressure adjustment dispatch.
In actual production operation, a time series method, a structural analysis method or a system analysis method and the like are usually adopted to predict the pressure of the water supply network, wherein the time series method has higher prediction precision and is only suitable for short-term prediction; the structural analysis method has poor prediction efficiency when the influence factors are more; the system analysis method has self-learning capability and is suitable for predicting the nonlinear dynamic system, but the problems of complex model, overlong training time, high requirement on training data and the like exist. In addition, the methods are easy to be interfered by noise, the prediction precision is unstable, and the pressure prediction requirement on the water supply pipe network system is difficult to meet.
Disclosure of Invention
The embodiment of the invention discloses a water supply network pressure prediction method and system based on machine learning.
The embodiment of the invention discloses a water supply network pressure prediction method based on machine learning in a first aspect, which comprises the following steps:
acquiring historical pressure data;
carrying out data preprocessing on the historical pressure data to obtain sample pressure data;
constructing a water supply network pressure prediction model based on the sample pressure data;
and performing pressure prediction analysis by using the water supply network pressure prediction model.
Preferably, after the obtaining of the historical pressure data and before the performing of the data preprocessing on the historical pressure data to obtain the sample pressure data, the method further includes:
performing water pressure distribution analysis on the historical pressure data to obtain a water pressure distribution condition;
periodically analyzing the historical pressure data to obtain a periodic change rule of each monitoring point corresponding to each time period;
setting a data preprocessing rule based on the water pressure distribution condition and the periodic variation rule, and performing data preprocessing on the historical pressure data;
the data preprocessing rule comprises a cleaning method, a value supplementing method and a transformation method, wherein the value supplementing method adopts a Lagrange interpolation method.
Preferably, the preprocessing the historical pressure data to obtain sample pressure data includes:
adopting the cleaning method to screen out invalid data in the historical pressure data, wherein the invalid data comprises repeated data, irrelevant data and error data;
detecting default value data existing in the historical pressure data, and determining dependent variables and independent variables of the historical pressure data;
carrying out value taking on a plurality of normal data distributed before and after the default data to obtain a plurality of default sets, wherein each default data corresponds to each default set;
interpolating the default value data by adopting a Lagrange interpolation method based on the dependent variable, the independent variable and the default value set to obtain intermediate historical pressure data;
and carrying out normalization processing on the intermediate historical pressure data by adopting the transformation method to obtain the sample pressure data.
Preferably, the constructing a water supply network pressure prediction model based on the sample pressure data includes:
dividing the sample pressure data into training data samples and testing data samples;
sequencing the training data samples according to a time gradient sequence, capturing a data examples in the training data samples based on each monitoring point and each time gradient node, and constructing to obtain a data subset A;
randomly sampling B data examples in the training data samples which do not comprise the first data example, and constructing to obtain a data subset B;
setting a gradient of (1-a)/B for the data instances in the data subset B;
eliminating data instances with weight values lower than a preset weight threshold value in the data subset A and the data subset B by adopting a GOSS algorithm, and calculating information gain;
reducing feature dimensions by adopting an EFB algorithm;
and constructing a decision tree by adopting a leaf-wise method, and generating the water supply network pressure prediction model based on the decision tree.
Preferably, the removing, by using the gos algorithm, the data instances in the data subset a and the data subset B whose weight values are lower than a preset weight threshold, and calculating the information gain includes:
Where d is the total number of samples of the data instance, giIn order to be a gradient of the magnetic field,the total number of samples of the left child node of the d node,the total number of samples of the right child node of the d node;
and, Al={xi∈A:xij≤d};Ar={xi∈A:xij>d};Bl={xi∈B:xij≤d};Br={xi∈B:xij>d}。
The second aspect of the embodiment of the invention discloses a water supply network pressure prediction system based on machine learning, which comprises:
a data acquisition unit for acquiring historical pressure data;
the preprocessing unit is used for preprocessing the historical pressure data to obtain sample pressure data;
the model construction unit is used for constructing a water supply network pressure prediction model based on the sample pressure data;
and the prediction analysis unit is used for performing pressure prediction analysis by adopting the water supply network pressure prediction model.
Preferably, the system further comprises:
the distribution analysis unit is used for carrying out water pressure distribution analysis on the historical pressure data to obtain a water pressure distribution condition;
the period analysis unit is used for periodically analyzing the historical pressure data to obtain a period change rule of each monitoring point corresponding to each time period;
the rule setting unit is used for setting a data preprocessing rule based on the water pressure distribution condition and the periodic variation rule and carrying out data preprocessing on the historical pressure data;
the data preprocessing rule comprises a cleaning method, a value supplementing method and a transformation method, wherein the value supplementing method adopts a Lagrange interpolation method.
Preferably, the pretreatment unit includes:
the cleaning subunit is used for screening out invalid data in the historical pressure data by adopting the cleaning method, wherein the invalid data comprises repeated data, irrelevant data and error data;
the default value detection subunit is used for detecting default value data existing in the historical pressure data and determining a dependent variable and an independent variable of the historical pressure data;
a set obtaining subunit, configured to take values of a plurality of normal data distributed before and after the default data to obtain a plurality of default sets, where each default data corresponds to each default set;
the interpolation subunit is used for interpolating the default value data by adopting a Lagrange interpolation method based on the dependent variable, the independent variable and the default value set to obtain intermediate historical pressure data;
and the normalization subunit is used for performing normalization processing on the intermediate historical pressure data by adopting the transformation method to obtain the sample pressure data.
Preferably, the model building unit includes:
the sample dividing subunit is used for dividing the sample pressure data into training data samples and test data samples;
the first construction subunit is used for sequencing the training data samples according to the time gradient sequence, capturing a data examples in the training data samples based on each monitoring point and each time gradient node, and constructing to obtain a data subset A;
the second construction subunit is used for randomly sampling B data instances in the training data samples which do not comprise the first data instance, and constructing a data subset B;
a gradient setting subunit, configured to set a gradient of (1-a)/B for the data instance in the data subset B;
the gain calculation subunit is used for eliminating data instances with weight values lower than a preset weight threshold value in the data subset A and the data subset B by adopting a GOSS algorithm and calculating information gain;
the dimension reduction subunit is used for reducing the characteristic dimension by adopting an EFB algorithm;
and the construction subunit is used for constructing a decision tree by adopting a leaf-wise method and generating the water supply network pressure prediction model based on the decision tree.
Preferably, in the gain calculating subunit, the information gain is calculated by using the following formula
Where d is the total number of samples of the data instance, giIn order to be a gradient of the magnetic field,the total number of samples of the left child node of the d node,the total number of samples of the right child node of the d node;
and, Al={xi∈A:xij≤d};Ar={xi∈A:xij>d};Bl={xi∈B:xij≤d};Br={xi∈B:xij>d}。
The third aspect of the embodiment of the invention discloses a water supply network pressure prediction system based on machine learning, which comprises:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute the water supply network pressure prediction method based on machine learning disclosed by the first aspect of the embodiment of the invention.
A fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium storing a computer program, where the computer program enables a computer to execute a water supply network pressure prediction method based on machine learning disclosed in the first aspect of the embodiments of the present invention.
A fifth aspect of embodiments of the present invention discloses a computer program product, which, when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.
A sixth aspect of the present embodiment discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where the computer program product is configured to, when running on a computer, cause the computer to perform part or all of the steps of any one of the methods in the first aspect.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
based on historical pressure data, a high-precision and high-efficiency water supply network pressure prediction model is constructed through machine learning and used for accurately predicting the pressure conditions of specific time points and specific monitoring points, so that a response plan can be prepared in advance to cope with the possible special conditions, stable water supply is ensured, and the pressure prediction requirement on a water supply network system is effectively met.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a water supply network pressure prediction method based on machine learning according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a water supply network pressure prediction system based on machine learning according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of another water supply network pressure prediction system based on machine learning according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first", "second", "third", "fourth", and the like in the description and the claims of the present invention are used for distinguishing different objects, and are not used for describing a specific order. The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a water supply network pressure prediction method and system based on machine learning, wherein a high-precision and high-efficiency water supply network pressure prediction model is constructed through machine learning based on historical pressure data and is used for accurately predicting the pressure conditions of a specific time point and a specific monitoring point, so that a response plan can be prepared in advance to cope with the possible special conditions, stable water supply is ensured, and the pressure prediction requirement on a water supply network system is effectively met.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a water supply network pressure prediction method based on machine learning according to an embodiment of the present invention. As shown in fig. 1, the water supply network pressure prediction method based on machine learning may include the following steps.
101. Historical pressure data is obtained.
In the embodiment of the invention, water supply mechanisms such as a water supply pipe network system and a service system of a tap water company are provided with water supply pressure detection equipment at monitoring points, historical pressure data obtained by long-term monitoring is obtained from the water supply mechanisms, and detection data of the water supply pressure detection equipment can be obtained constantly by establishing a data transmission channel.
The historical pressure data comprises geographic information of monitoring points, time information, unit time pressure data and the like.
102. And carrying out data preprocessing on the historical pressure data to obtain sample pressure data.
In the embodiment of the invention, the historical pressure data needs to be preprocessed to ensure the accuracy and effectiveness of the data, and a proper data preprocessing method is selected according to the data rule and the data characteristics of the historical pressure data before data preprocessing.
As an optional implementation manner, after acquiring the historical pressure data and before performing data preprocessing on the historical pressure data to obtain sample pressure data, performing water pressure distribution analysis on the historical pressure data to obtain a water pressure distribution condition; periodically analyzing the historical pressure data to obtain a periodic change rule of each monitoring point corresponding to each time period; setting a data preprocessing rule based on the water pressure distribution condition and the periodic variation rule, and performing data preprocessing aiming at historical pressure data; the data preprocessing rule comprises a cleaning method, a value supplementing method and a transformation method, wherein the value supplementing method adopts a Lagrange interpolation method. Specifically, water pressure distribution analysis is performed based on geographic information of each monitoring point, pipeline specifications, water pressure distribution conditions of each interface of the pipeline and the like, and water pressure distribution conditions corresponding to each monitoring point are obtained; and analyzing the periodic change rules of the water pressure of each monitoring point in different time periods by taking year, quarter, month, day, hour and the like as the time periods, and further setting data preprocessing rules such as a cleaning method, a value supplementing method, a conversion method and the like based on the water pressure distribution condition and the periodic change rules, so that the data preprocessing process is ensured to conform to the data rules and the data characteristics of historical pressure data, and the processing efficiency is improved.
As an optional implementation manner, a cleaning method is adopted to screen out invalid data in the historical pressure data, wherein the invalid data comprises repeated data, irrelevant data and error data; detecting default value data existing in the historical pressure data, and determining dependent variables and independent variables of the historical pressure data; carrying out value taking on a plurality of normal data distributed before and after the default data to obtain a plurality of default sets, wherein each default data corresponds to each default set; interpolating the default value data by adopting a Lagrange interpolation method based on the dependent variable, the independent variable and the default value set to obtain intermediate historical pressure data; and carrying out normalization processing on the intermediate historical pressure data by adopting a conversion method to obtain sample pressure data. Specifically, the historical pressure data is identified based on the data film Bayer, and repeated data generated due to repeated collection and transmission errors, irrelevant data obtained by mistaken collection and error data generated due to transmission errors are screened out; then, default data with a correct format but a missing value in the historical pressure data is detected, wherein a Lagrange interpolation method is adopted to determine a dependent variable and an independent variable in the historical pressure data, the default data is identified based on the dependent variable and the independent variable, 5 pieces of normal data before and after the default data are taken out, a default set is constructed by every 10 pieces of the taken normal data, one default data corresponds to one default set, and then the following Lagrange interpolation formula is adopted to carry out data interpolation according to the normal data in the default set as the corresponding default data:
wherein x is the subscript number corresponding to the data with missing value, Ln(x) As a result of interpolation of the missing value data, xiY being normal dataiLower order ofNumber (n).
Therefore, invalid data in the historical pressure data are screened out, missing value data with missing values are supplemented, intermediate historical pressure data are obtained, effectiveness and objectivity of the data are effectively maintained, and waste of data resources is avoided.
Furthermore, according to the requirements of subsequent analysis and mining, the intermediate historical pressure data is subjected to normalized processing by adopting conversion methods such as simple function conversion, continuous attribute discretization or attribute construction and the like to obtain sample pressure data, so that the sample pressure data can be directly applied to the data interface type of the model construction and mining analysis task.
103. And constructing a water supply network pressure prediction model based on the sample pressure data.
In the embodiment of the invention, the calculation precision is ensured by taking measures based on weight sampling, and the calculation efficiency is improved by reducing the characteristic dimension.
As an optional implementation, the sample pressure data is divided into training data samples and test data samples; sequencing the training data samples according to the time gradient sequence, and capturing a data examples in the training data samples based on each monitoring point and each time gradient node to construct a data subset A; randomly sampling B data examples in the training data samples which do not comprise the first data example, and constructing to obtain a data subset B; setting a gradient of (1-a)/B for the data example in the data subset B; removing data instances with weight values lower than a preset weight threshold value in the data subset A and the data subset B by adopting a GOSS (continuous-based One-Side Sampling) algorithm, and calculating information gain; an EFB (Exclusive Feature binding) algorithm is adopted to reduce Feature dimensions; and constructing a decision tree by adopting a leaf-wise method, and generating a water supply network pressure prediction model based on the decision tree. Specifically, after obtaining data subset a and data subset B by using gos, the information gain is calculated by the following formula
Where d is the total number of samples of the data instance, giIn order to be a gradient of the magnetic field,the total number of samples of the left child node of the d node,the total number of samples of the right child node of the d node;
and, Al={xi∈A:xij≤d};Ar={xi∈A:xij>d};Bl={xi∈B:xij≤d};Br={xi∈B:xij>d};
Therefore, the data set is reduced, and the calculation amount of information gain is greatly reduced;
and then sorting the feature dimensions according to the number of non-zero values through EFB, calculating the conflict ratio among different feature dimensions, and performing combination attempt on each feature dimension to minimize the conflict ratio, thereby realizing the minimization of the feature dimensions and further improving the calculation efficiency.
And according to the calculation result, constructing a decision tree by adopting a leaf-wise method, presetting the maximum depth for the decision tree to avoid overfitting, and generating a water supply network pressure prediction model according to the decision tree until the decision tree is constructed.
Therefore, the application of GOSS and EFB effectively improves the calculation efficiency, meanwhile, the balance between the calculation efficiency and the calculation precision is realized, and the decision tree limited by the maximum depth ensures that the water supply network pressure prediction model does not generate overfitting.
104. And performing pressure prediction analysis by adopting a water supply network pressure prediction model.
In the embodiment of the invention, the monitoring data of each monitoring point is acquired by the water supply network pressure prediction model at any time for model iteration, and the prediction precision of the model is higher and higher along with the time lapse and the increase of the iteration data volume, so that the pressure conditions of specific time points and specific monitoring points can be accurately predicted in a huge water supply network system to find out the possible special conditions such as peak water conditions, overhigh/overlow node water pressure and the like, thereby preparing a coping plan in advance, ensuring that the orderly coping is carried out when the special conditions occur, and realizing stable water supply.
In conclusion, based on historical pressure data, a high-precision and high-efficiency water supply network pressure prediction model is constructed through machine learning and used for accurately predicting the pressure conditions of specific time points and specific monitoring points, so that a response plan can be prepared in advance to cope with the possible special conditions, stable water supply is ensured, and the pressure prediction requirement on a water supply network system is effectively met.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of a water supply network pressure prediction system based on machine learning according to an embodiment of the present invention. As shown in fig. 2, the water supply network pressure prediction method based on machine learning may include the following.
A data acquisition unit 201 for acquiring historical pressure data;
the distribution analysis unit 202 is used for performing water pressure distribution analysis on the historical pressure data to obtain a water pressure distribution condition;
the period analysis unit 203 is configured to perform periodic analysis on the historical pressure data to obtain a periodic change rule of each monitoring point corresponding to each time period;
a rule setting unit 204, configured to set a data preprocessing rule based on a water pressure distribution condition and a periodic variation rule, so as to perform data preprocessing on historical pressure data;
the data preprocessing rule comprises a cleaning method, a value supplementing method and a transformation method, wherein the value supplementing method adopts a Lagrange interpolation method;
the preprocessing unit 205 is configured to perform data preprocessing on the historical pressure data to obtain sample pressure data;
a model construction unit 206, configured to construct a water supply network pressure prediction model based on the sample pressure data;
and the prediction analysis unit 207 is used for performing pressure prediction analysis by adopting a water supply network pressure prediction model.
Among them, the preprocessing unit 205 includes:
a cleaning subunit 2051, configured to screen invalid data in the historical pressure data by using a cleaning method, where the invalid data includes repeated data, irrelevant data, and error data;
an deficiency detection subunit 2052, configured to detect deficiency data existing in the historical pressure data, and determine a dependent variable and an independent variable of the historical pressure data;
a set obtaining subunit 2053, configured to take values of a plurality of normal data distributed before and after the default data to obtain a plurality of default sets, where each default data corresponds to each default set;
an interpolation subunit 2054, configured to interpolate the default value data by using a lagrange interpolation method based on the dependent variable, the independent variable, and the default value set to obtain intermediate historical pressure data;
and a normalization subunit 2055, configured to perform normalization processing on the intermediate historical pressure data by using a transformation method to obtain sample pressure data.
And, the model construction unit 206 includes:
a sample dividing subunit 2061 configured to divide the sample pressure data into a training data sample and a test data sample;
a first constructing subunit 2062, configured to sort the training data samples according to the time gradient sequence, and capture a data instances a in the training data samples based on each monitoring point and each time gradient node, so as to construct a data subset a;
a second constructing subunit 2063, configured to randomly sample B data instances in the training data sample that does not include the first data instance, and construct a data subset B;
a gradient setting subunit 2064 configured to set a gradient of (1-a)/B for the data instance in the data subset B;
a gain calculation subunit 2065, configured to eliminate, by using the gos algorithm, a data instance in the data subset a and the data subset B whose weight value is lower than a preset weight threshold, and calculate an information gain;
where d is the total number of samples of the data instance, giIn order to be a gradient of the magnetic field,the total number of samples of the left child node of the d node,the total number of samples of the right child node of the d node;
and, Al={xi∈A:xij≤d};Ar={xi∈A:xij>d};Bl={xi∈B:xij≤d};Br={xi∈B:xij>d};
A dimension reduction subunit 2066, configured to reduce the feature dimension by using the EFB algorithm;
a construction subunit 2067, configured to construct a decision tree by using a leaf-wise method, and generate the water supply network pressure prediction model based on the decision tree.
As an optional implementation manner, after acquiring the historical pressure data and before performing data preprocessing on the historical pressure data to obtain sample pressure data, performing water pressure distribution analysis on the historical pressure data to obtain a water pressure distribution condition; periodically analyzing the historical pressure data to obtain a periodic change rule of each monitoring point corresponding to each time period; setting a data preprocessing rule based on the water pressure distribution condition and the periodic variation rule, and performing data preprocessing aiming at historical pressure data; the data preprocessing rule comprises a cleaning method, a value supplementing method and a transformation method, wherein the value supplementing method adopts a Lagrange interpolation method. Specifically, water pressure distribution analysis is performed based on geographic information of each monitoring point, pipeline specifications, water pressure distribution conditions of each interface of the pipeline and the like, and water pressure distribution conditions corresponding to each monitoring point are obtained; and analyzing the periodic change rules of the water pressure of each monitoring point in different time periods by taking year, quarter, month, day, hour and the like as the time periods, and further setting data preprocessing rules such as a cleaning method, a value supplementing method, a conversion method and the like based on the water pressure distribution condition and the periodic change rules, so that the data preprocessing process is ensured to conform to the data rules and the data characteristics of historical pressure data, and the processing efficiency is improved.
As an optional implementation manner, the washing subunit 2051 screens out invalid data in the historical pressure data by using a washing method, where the invalid data includes repeated data, irrelevant data, and error data; the deficiency detection subunit 2052 detects deficiency data present in the historical pressure data, and determines a dependent variable and an independent variable of the historical pressure data; the set obtaining subunit 2053 takes values of a plurality of normal data distributed before and after the default data to obtain a plurality of default sets, and each default data corresponds to each default set; the interpolation subunit 2054 interpolates the default value data by a lagrange interpolation method based on the dependent variable, the independent variable and the default value set to obtain intermediate historical pressure data; the normalization subunit 2055 performs normalization processing on the intermediate historical pressure data by using a transformation method to obtain sample pressure data. Specifically, the historical pressure data is identified based on the data film Bayer, and repeated data generated due to repeated collection and transmission errors, irrelevant data obtained by mistaken collection and error data generated due to transmission errors are screened out; then, default data with a correct format but a missing value in the historical pressure data is detected, wherein a Lagrange interpolation method is adopted to determine a dependent variable and an independent variable in the historical pressure data, the default data is identified based on the dependent variable and the independent variable, 5 pieces of normal data before and after the default data are taken out, a default set is constructed by every 10 pieces of the taken normal data, one default data corresponds to one default set, and then the following Lagrange interpolation formula is adopted to carry out data interpolation according to the normal data in the default set as the corresponding default data:
wherein x is the subscript number corresponding to the data with missing value, Ln(x) As a result of interpolation of the missing value data, xiY being normal dataiSubscript number of (a).
Therefore, invalid data in the historical pressure data are screened out, missing value data with missing values are supplemented, intermediate historical pressure data are obtained, effectiveness and objectivity of the data are effectively maintained, and waste of data resources is avoided.
Furthermore, the normalization subunit 2055 performs normalization processing on the intermediate historical pressure data by using conversion methods such as simple function conversion, continuous attribute discretization or attribute construction and the like according to the requirements of subsequent analysis and mining to obtain sample pressure data, thereby ensuring that the sample pressure data can be directly applied to the data interface types of the model construction and mining analysis tasks.
As an alternative embodiment, the sample dividing subunit 2061 divides the sample pressure data into a training data sample and a test data sample; the first construction subunit 2062 sorts the training data samples according to the time gradient sequence, and captures a data instances in the training data samples based on each monitoring point and each time gradient node to construct a data subset a; the second constructing subunit 2063 randomly samples B data instances in the training data sample which does not include the first data instance, and constructs a data subset B; the gradient setting subunit 2064 sets the gradient to (1-a)/B for the data instance in the data subset B; the gain calculation subunit 2065 adopts GOSS (Gradient-based One-Side Sampling, single-Side Gradient Sampling)Sample) the data example with the weight value lower than the preset weight threshold in the data subset A and the data subset B is removed by the algorithm, and the information gain is calculated; the dimension reduction subunit 2066 reduces the Feature dimension by using an EFB (Exclusive Feature binding) algorithm; the construction subunit 2067 constructs a decision tree by a leaf-wise method, and generates a water supply network pressure prediction model based on the decision tree. Specifically, after obtaining data subset a and data subset B by using gos, the information gain is calculated by the following formula
Where d is the total number of samples of the data instance, giIn order to be a gradient of the magnetic field,the total number of samples of the left child node of the d node,the total number of samples of the right child node of the d node;
and, Al={xi∈A:xij≤d};Ar={xi∈A:xij>d};Bl={xi∈B:xij≤d};Br={xi∈B:xij>d};
Therefore, the data set is reduced, and the calculation amount of information gain is greatly reduced;
and then sorting the feature dimensions according to the number of non-zero values through EFB, calculating the conflict ratio among different feature dimensions, and performing combination attempt on each feature dimension to minimize the conflict ratio, thereby realizing the minimization of the feature dimensions and further improving the calculation efficiency.
And according to the calculation result, constructing a decision tree by adopting a leaf-wise method, presetting the maximum depth for the decision tree to avoid overfitting, and generating a water supply network pressure prediction model according to the decision tree until the decision tree is constructed.
In conclusion, based on historical pressure data, a high-precision and high-efficiency water supply network pressure prediction model is constructed through machine learning and used for accurately predicting the pressure conditions of specific time points and specific monitoring points, so that a response plan can be prepared in advance to cope with the possible special conditions, stable water supply is ensured, and the pressure prediction requirement on a water supply network system is effectively met.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of another water supply network pressure prediction system based on machine learning according to an embodiment of the present disclosure. As shown in fig. 3, the water supply network pressure prediction system based on machine learning may include:
a memory 301 storing executable program code;
a processor 302 coupled to the memory 301;
wherein the processor 302 invokes executable program code stored in the memory 301 to perform a machine learning based water supply network pressure prediction method of fig. 1.
The embodiment of the invention discloses a computer readable storage medium which stores a computer program, wherein the computer program enables a computer to execute a water supply network pressure prediction method based on machine learning in the figure 1.
Embodiments of the present invention also disclose a computer program product, wherein, when the computer program product is run on a computer, the computer is caused to execute part or all of the steps of the method as in the above method embodiments.
It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by hardware instructions of a program, and the program may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other Memory, such as a magnetic disk, or a combination thereof, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
The above detailed description is provided for the water supply network pressure prediction method and system based on machine learning, which are disclosed in the embodiments of the present invention, and the principle and implementation of the present invention are explained in the present document by applying specific examples, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A water supply network pressure prediction method based on machine learning, the method comprising:
acquiring historical pressure data;
carrying out data preprocessing on the historical pressure data to obtain sample pressure data;
constructing a water supply network pressure prediction model based on the sample pressure data;
and performing pressure prediction analysis by using the water supply network pressure prediction model.
2. The method of claim 1, wherein after said obtaining historical pressure data and before said pre-processing said historical pressure data to obtain sample pressure data, said method further comprises:
performing water pressure distribution analysis on the historical pressure data to obtain a water pressure distribution condition;
periodically analyzing the historical pressure data to obtain a periodic change rule of each monitoring point corresponding to each time period;
setting a data preprocessing rule based on the water pressure distribution condition and the periodic variation rule, and performing data preprocessing on the historical pressure data;
the data preprocessing rule comprises a cleaning method, a value supplementing method and a transformation method, wherein the value supplementing method adopts a Lagrange interpolation method.
3. The method of claim 2, wherein the pre-processing the historical pressure data to obtain sample pressure data comprises:
adopting the cleaning method to screen out invalid data in the historical pressure data, wherein the invalid data comprises repeated data, irrelevant data and error data;
detecting default value data existing in the historical pressure data, and determining dependent variables and independent variables of the historical pressure data;
carrying out value taking on a plurality of normal data distributed before and after the default data to obtain a plurality of default sets, wherein each default data corresponds to each default set;
interpolating the default value data by adopting a Lagrange interpolation method based on the dependent variable, the independent variable and the default value set to obtain intermediate historical pressure data;
and carrying out normalization processing on the intermediate historical pressure data by adopting the transformation method to obtain the sample pressure data.
4. The method of claim 3, wherein constructing a water supply network pressure prediction model based on the sample pressure data comprises:
dividing the sample pressure data into training data samples and testing data samples;
sequencing the training data samples according to a time gradient sequence, capturing a data examples in the training data samples based on each monitoring point and each time gradient node, and constructing to obtain a data subset A;
randomly sampling B data examples in the training data samples which do not comprise the first data example, and constructing to obtain a data subset B;
setting a gradient of (1-a)/B for the data instances in the data subset B;
eliminating data instances with weight values lower than a preset weight threshold value in the data subset A and the data subset B by adopting a GOSS algorithm, and calculating information gain;
reducing feature dimensions by adopting an EFB algorithm;
and constructing a decision tree by adopting a leaf-wise method, and generating the water supply network pressure prediction model based on the decision tree.
5. The method according to claim 4, wherein said using GOSS algorithm to cull data instances in said data subset A and said data subset B having weight values lower than a predetermined weight threshold, and calculating information gain comprises:
Where d is the total number of samples of the data instance, giIn order to be a gradient of the magnetic field,the total number of samples of the left child node of the d node,the total number of samples of the right child node of the d node;
and, Al={xi∈A:xij≤d};Ar={xi∈A:xij>d};Bl={xi∈B:xij≤d};Br={xi∈B:xij>d}。
6. A water supply network pressure prediction system based on machine learning, the system comprising:
a data acquisition unit for acquiring historical pressure data;
the preprocessing unit is used for preprocessing the historical pressure data to obtain sample pressure data;
the model construction unit is used for constructing a water supply network pressure prediction model based on the sample pressure data;
and the prediction analysis unit is used for performing pressure prediction analysis by adopting the water supply network pressure prediction model.
7. The system of claim 6, further comprising:
the distribution analysis unit is used for carrying out water pressure distribution analysis on the historical pressure data to obtain a water pressure distribution condition;
the period analysis unit is used for periodically analyzing the historical pressure data to obtain a period change rule of each monitoring point corresponding to each time period;
the rule setting unit is used for setting a data preprocessing rule based on the water pressure distribution condition and the periodic variation rule and carrying out data preprocessing on the historical pressure data;
the data preprocessing rule comprises a cleaning method, a value supplementing method and a transformation method, wherein the value supplementing method adopts a Lagrange interpolation method.
8. The system of claim 7, wherein the preprocessing unit comprises:
the cleaning subunit is used for screening out invalid data in the historical pressure data by adopting the cleaning method, wherein the invalid data comprises repeated data, irrelevant data and error data;
the default value detection subunit is used for detecting default value data existing in the historical pressure data and determining a dependent variable and an independent variable of the historical pressure data;
a set obtaining subunit, configured to take values of a plurality of normal data distributed before and after the default data to obtain a plurality of default sets, where each default data corresponds to each default set;
the interpolation subunit is used for interpolating the default value data by adopting a Lagrange interpolation method based on the dependent variable, the independent variable and the default value set to obtain intermediate historical pressure data;
and the normalization subunit is used for performing normalization processing on the intermediate historical pressure data by adopting the transformation method to obtain the sample pressure data.
9. The system according to claim 4, wherein the model building unit comprises:
the sample dividing subunit is used for dividing the sample pressure data into training data samples and test data samples;
the first construction subunit is used for sequencing the training data samples according to the time gradient sequence, capturing a data examples in the training data samples based on each monitoring point and each time gradient node, and constructing to obtain a data subset A;
the second construction subunit is used for randomly sampling B data instances in the training data samples which do not comprise the first data instance, and constructing a data subset B;
a gradient setting subunit, configured to set a gradient of (1-a)/B for the data instance in the data subset B;
the gain calculation subunit is used for eliminating data instances with weight values lower than a preset weight threshold value in the data subset A and the data subset B by adopting a GOSS algorithm and calculating information gain;
the dimension reduction subunit is used for reducing the characteristic dimension by adopting an EFB algorithm;
and the construction subunit is used for constructing a decision tree by adopting a leaf-wise method and generating the water supply network pressure prediction model based on the decision tree.
10. The system of claim 9, wherein the gain calculating subunit calculates the information gain using the following formula
Where d is the total number of samples of the data instance, giIn order to be a gradient of the magnetic field,the total number of samples of the left child node of the d node,the total number of samples of the right child node of the d node;
and, Al={xi∈A:xij≤d};Ar={xi∈A:xij>d};Bl={xi∈B:xij≤d};Br={xi∈B:xij>d}。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110497008.3A CN113378335A (en) | 2021-05-07 | 2021-05-07 | Water supply network pressure prediction method and system based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110497008.3A CN113378335A (en) | 2021-05-07 | 2021-05-07 | Water supply network pressure prediction method and system based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113378335A true CN113378335A (en) | 2021-09-10 |
Family
ID=77570792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110497008.3A Pending CN113378335A (en) | 2021-05-07 | 2021-05-07 | Water supply network pressure prediction method and system based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113378335A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9558453B1 (en) * | 2015-12-21 | 2017-01-31 | International Business Machines Corporation | Forecasting leaks in pipeline network |
CN108764540A (en) * | 2018-05-16 | 2018-11-06 | 杭州电子科技大学 | Water supply network pressure prediction method based on parallel LSTM series connection DNN |
CN111414717A (en) * | 2020-03-02 | 2020-07-14 | 浙江大学 | XGboost-L ightGBM-based unit power prediction method |
CN111476422A (en) * | 2020-04-10 | 2020-07-31 | 北京石油化工学院 | L ightGBM building cold load prediction method based on machine learning framework |
CN111753399A (en) * | 2020-05-26 | 2020-10-09 | 中南大学 | A method for predicting pressure drop in a slurry-filled loop using machine learning |
CN112597701A (en) * | 2020-12-18 | 2021-04-02 | 中国石油大学(北京) | Method, device and equipment for determining pressure in pipeline |
CN112733443A (en) * | 2020-12-31 | 2021-04-30 | 北京工业大学 | Water supply network model parameter optimization checking method based on virtual monitoring points |
-
2021
- 2021-05-07 CN CN202110497008.3A patent/CN113378335A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9558453B1 (en) * | 2015-12-21 | 2017-01-31 | International Business Machines Corporation | Forecasting leaks in pipeline network |
CN108764540A (en) * | 2018-05-16 | 2018-11-06 | 杭州电子科技大学 | Water supply network pressure prediction method based on parallel LSTM series connection DNN |
CN111414717A (en) * | 2020-03-02 | 2020-07-14 | 浙江大学 | XGboost-L ightGBM-based unit power prediction method |
CN111476422A (en) * | 2020-04-10 | 2020-07-31 | 北京石油化工学院 | L ightGBM building cold load prediction method based on machine learning framework |
CN111753399A (en) * | 2020-05-26 | 2020-10-09 | 中南大学 | A method for predicting pressure drop in a slurry-filled loop using machine learning |
CN112597701A (en) * | 2020-12-18 | 2021-04-02 | 中国石油大学(北京) | Method, device and equipment for determining pressure in pipeline |
CN112733443A (en) * | 2020-12-31 | 2021-04-30 | 北京工业大学 | Water supply network model parameter optimization checking method based on virtual monitoring points |
Non-Patent Citations (1)
Title |
---|
揣雪雨: "基于LightGBM算法的个人信用评估模型研究", 《中国优秀硕士学位论文全文数据库》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110400021B (en) | Bank branch cash usage prediction method and device | |
CN110400022B (en) | Cash consumption prediction method and device for self-service teller machine | |
CN104657788B (en) | Key industry electricity demand forecasting method based on Industrial Cycle index | |
CN110209560B (en) | Data anomaly detection method and detection device | |
CN104992239B (en) | A kind of trade power consumption gauge rule Forecasting Methodology based on related coefficient | |
CN103413188B (en) | A kind of monthly industrial power predicating method based on industry Business Process System | |
CN109636171A (en) | A kind of comprehensive diagnos and risk evaluating method that regional vegetation restores | |
KR20140068436A (en) | Abnormality observation data detection method using time series prediction model and abnormality observation data of ground water level | |
CN108830417B (en) | ARMA (autoregressive moving average) and regression analysis based life energy consumption prediction method and system | |
CN113268403B (en) | Time series analysis and forecasting methods, devices, equipment and storage media | |
CN113554079B (en) | A method and system for detecting abnormal data of electric load based on secondary detection method | |
CN113742993A (en) | Method, device, equipment and storage medium for predicting life loss of dry-type transformer | |
CN109711450A (en) | A kind of power grid forecast failure collection prediction technique, device, electronic equipment and storage medium | |
CN115545331A (en) | Control strategy prediction method and device, equipment and storage medium | |
CN110543869A (en) | Ball screw service life prediction method and device, computer equipment and storage medium | |
CN118132948A (en) | A method for predicting electric power | |
CN104268662B (en) | A Subsidence Prediction Method Based on Step-by-step Optimal Quantile Regression | |
CN112087316A (en) | Network anomaly root cause positioning method based on anomaly data analysis | |
CN109978172B (en) | A method and device for predicting resource pool utilization based on extreme learning machine | |
CN115878958A (en) | Transformer oil temperature prediction method, device, equipment and storage medium | |
AU2020101462A4 (en) | Method and device for predicting and evaluating regional eco-quality annual harvest | |
CN117236571B (en) | Planning method and system based on Internet of things | |
CN113378335A (en) | Water supply network pressure prediction method and system based on machine learning | |
CN112712194A (en) | Electric quantity prediction method and device for power consumption cost intelligent optimization analysis | |
CN118747279A (en) | Calculation method and device for time downscaling of river material flux, and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210910 |