CN116523388B - Data-driven quality modeling method based on industrial Internet platform - Google Patents
Data-driven quality modeling method based on industrial Internet platform Download PDFInfo
- Publication number
- CN116523388B CN116523388B CN202310408969.1A CN202310408969A CN116523388B CN 116523388 B CN116523388 B CN 116523388B CN 202310408969 A CN202310408969 A CN 202310408969A CN 116523388 B CN116523388 B CN 116523388B
- Authority
- CN
- China
- Prior art keywords
- data
- sample
- model
- representing
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000012937 correction Methods 0.000 claims abstract description 32
- 230000008569 process Effects 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000009467 reduction Effects 0.000 claims abstract description 7
- 238000012847 principal component analysis method Methods 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 33
- 238000004422 calculation algorithm Methods 0.000 claims description 24
- 238000004519 manufacturing process Methods 0.000 claims description 11
- 238000005457 optimization Methods 0.000 claims description 10
- 230000002159 abnormal effect Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 9
- 238000010801 machine learning Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 9
- 230000002068 genetic effect Effects 0.000 claims description 8
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims 1
- 239000000126 substance Substances 0.000 abstract description 10
- 238000001514 detection method Methods 0.000 abstract description 7
- 239000002994 raw material Substances 0.000 abstract description 4
- 238000004134 energy conservation Methods 0.000 abstract description 3
- 238000005265 energy consumption Methods 0.000 abstract description 3
- 230000001737 promoting effect Effects 0.000 abstract description 3
- 230000009466 transformation Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 33
- 230000004913 activation Effects 0.000 description 7
- 230000001419 dependent effect Effects 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 6
- 230000035772 mutation Effects 0.000 description 6
- 210000000349 chromosome Anatomy 0.000 description 5
- 238000012886 linear function Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 238000012843 least square support vector machine Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000001311 chemical methods and process Methods 0.000 description 1
- 238000012824 chemical production Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000000855 fermentation Methods 0.000 description 1
- 230000004151 fermentation Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000004205 output neuron Anatomy 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000004886 process control Methods 0.000 description 1
- 239000000376 reactant Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000003756 stirring Methods 0.000 description 1
- 230000000946 synaptic effect Effects 0.000 description 1
- 229920002994 synthetic fiber Polymers 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Educational Administration (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Manufacturing & Machinery (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a data-driven quality modeling method based on an industrial Internet platform, which comprises the following steps: different system data are collected based on an industrial Internet platform, and unified summarization is carried out on the data; carrying out data preprocessing on the acquired data; selecting auxiliary variables according to the process principle and the process characteristics, and adopting a principal component analysis method to reduce the dimension of the auxiliary variables; constructing a key product quality prediction model based on a data-driven modeling strategy; and carrying out deviation correction and model parameter correction on the established prediction model. The method provided by the invention can greatly reduce the requirements of factories on measuring equipment, and has important significance for improving the product quality, promoting energy conservation and consumption reduction and accelerating the digital transformation of enterprises; the method provided by the invention can predict the key indexes of chemical raw materials and products in real time, avoids the problems of long time consumption, difficult detection or incapability of detection of certain indexes and the like, and saves a great amount of time and resources.
Description
Technical Field
The invention relates to the field of industrial Internet, in particular to a data-driven quality modeling method based on an industrial Internet platform.
Background
In petrochemical processes, the simulation, control and optimization of the system often rely on high performance models. With the increasing market competition and the increasing environmental protection requirements in recent years, enterprises are urgently required to improve economic benefits as much as possible from effective resources, so that new requirements are put on process control and optimization, and modeling difficulty is also increased, particularly modeling of biological parameters in modeling fermentation processes of strong nonlinearity and time-varying objects such as physical and chemical parameters in chemical processes under a continuous stirring reaction kettle. For example, continuous Stirred Tank Reactors (CSTRs) are a widely used type of reactor in polymerization chemistry, which not only plays a significant role in the core equipment of chemical production but also is commonly used in the dye, pharmaceutical reagents, food and synthetic materials industries. However, on the contrary, the reason why the automatic control of the reaction process therein has been slow is mainly that the process modeling thereof has been made very difficult by the fact that the reaction process therein often involves a lot of physical and chemical interactions and influences that make the reaction process exhibit a high degree of nonlinearity.
In the actual production operation of chemical industry, due to the lack of technical means and hardware equipment, a core production system cannot feed back all required process parameters in real time, and if the reaction process needs to be better controlled, data information in the reaction process needs to be obtained. Compared with the variables such as temperature, pressure, liquid level, volume and the like which are relatively easy to measure in real time, the parameters such as reactant concentration and the like lack reliable sensors to detect the variables on line, and the cost is high. Many industrial production systems fail to rely on fault diagnosis and status detection to improve the safety of the system operation. This also brings great trouble to the quality of the product. Other factors, such as temperature, concentration of the feedstock within the reactor, etc., may also be affected during the production process, which may result in an uncertainty in the modeled type.
The arrival of big data and industrial Internet age opens up a new method for the intellectualization of the chemical industry field by algorithm research represented by mathematical mining and machine learning technologies, and indicates a new direction. The data-driven quality modeling method based on the industrial Internet platform has higher flexibility and reality correlation, and can fully mine important information in historical data by utilizing strong learning and characterization capabilities of the data-driven quality modeling method, and an accurate prediction model is established for key raw materials and product quality indexes.
For the problems in the related art, no effective solution has been proposed at present.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a data-driven quality modeling method based on an industrial Internet platform, so as to overcome the technical problems in the prior related art.
For this purpose, the invention adopts the following specific technical scheme:
a data-driven quality modeling method based on an industrial internet platform, the method comprising the steps of:
s1, collecting different system data based on an industrial Internet platform, and uniformly summarizing the data;
s2, carrying out data preprocessing on the acquired data;
s3, selecting auxiliary variables according to a process principle and process characteristics, and performing dimension reduction on the auxiliary variables by adopting a principal component analysis method;
s4, constructing a key product quality prediction model based on a data driving modeling strategy;
s5, performing deviation correction and model parameter correction on the established prediction model.
Further, the data preprocessing of the collected data includes the following steps:
s201, merging and storing the acquired data to obtain sample data;
s202, abnormal data elimination and filtering processing are carried out on the sample data, and data are normalized.
Further, the calculation formula for fusing the collected data is as follows:
wherein h is 1q Indicating that the business system is at t 1q Data collected at the moment;
h 2q indicating that the production system is at t 2q Data collected at the moment;
ε h1 representing the acquired data h 1q Root mean square error of (a);
ε t1 indicating time t 1q Root mean square error of (a);
ε h2 representing the acquired data h 2q Root mean square error of (a);
ε t2 indicating time t 2q Root mean square error of (a);
h q representing the business system and the production system at t q And collecting the data fusion result at the moment.
Furthermore, the sample data is subjected to abnormal data rejection and is subjected to screening treatment by adopting a 3 sigma judgment principle, and the specific steps are as follows;
assuming that n auxiliary variables in the sample data are x, the sequence of x is x 1 ,x 2 ,…,x i (i=1, 2,3 … n), and the average value x and standard deviation σ thereof are calculated:
if the auxiliary variable x in the sample satisfies the following formula:
then the sample is removed as an abnormal sample, the 3 sigma judgment processing is sequentially carried out on other auxiliary variables in the sample, and the screened sample is selected into a modeling sample set;
further, the sample data is filtered to obtain a sample by the following formula
And (3) carrying out average filtering:
X(t)=(X(t-T/2)+X(t-T/2+T c )+…+X(t))
…+X(t-T/2+T c )+X(t+T/2)/(T/T c )
wherein t represents a sampling time;
t represents a filtering time constant;
T c representing the sampling period.
Further, the normalizing the data normalizes the sample data to [ y ] by the following formula min ,y max ]:
y=[y min ,y max ]*(x-x min )/(x max -x min )+y min
Wherein y is min ,y max Representing the upper and lower bounds of the normalized target;
x max ,x min representing the current variable value as upper and lower bounds.
Further, the main component analysis method comprises the following calculation steps:
1) Normalizing the original sample data and forming a normalization matrix:
let m-dimensional random vector x= (X) 1 ,X 2 ,…,X n ) T For n samples X i =(X i1 ,X i2 ,…,X im ) T (i=1, 2,3 … m), T is the superscript of the matrix transpose, form the sample matrix, normalize the sample matrix, average the samples:
sample variance:
the normalized data are:
wherein, (i=1, 2,3 … m; k=1, 2,3 … n),
form a standardized matrix X (X ik );
2) Sample correlation coefficient matrix is calculated for standard price matrix:
wherein r is ij Elements representing row i, column j of matrix R, (i, j=1, 2,3 … m);
3) Determining the main components:
solving characteristic equation |R-lambda I of sample correlation matrix R m M eigenvectors are obtained by =0, wherein λ represents eigenvalues, I represents an identity matrix, and R is a symmetric matrix, eigenvalues are obtained by jacobian method, and the eigenvalues are obtained according toDetermining the value of p to make the information utilization rate up to above 85% to obtain p main components, for every lambda j (j=1, 2,3 … p) solve the equation set rb=λ j b Unit feature vector->b represents a feature vector set;
4) Converting the standardized index variable into a main component:
wherein U is 1 Called first principal component, U 2 Called second principal component, U m Called the m-th principal component;
5) And comprehensively evaluating the m main components, and carrying out weighted summation on the m main components to obtain a final evaluation value, wherein the weight is the variance contribution rate of each main component.
Furthermore, the key product quality prediction model constructed based on the data-driven modeling strategy adopts an algorithm in a machine learning algorithm library built in an industrial Internet platform, and models by combining the preprocessed data.
Further, the performing offset correction by the prediction model includes: in the running process of the model, new data are adopted to correct the model, and a deviation correction method is adopted to correct the model according to the model prediction error, wherein the calculation formula of the deviation correction method is as follows:
wherein,representing the output value of the model after correction at the current moment;
representing a predicted value output by the current time model;
k represents a correction coefficient;
y (t-1) andrepresenting the real value at the previous moment and the predicted value output by the model;
t represents a sampling time;
the correction coefficient is obtained by dividing the model error of the current period and the model error of the previous period:
wherein Y (t) i ) Representing data within a current time period;
representing an average value of the predicted values in the current period;
Y(t i -t) represents data in a previous period;
Y m (t i -t) represents the median value of the predictions in the previous period;
K=median(K i ) Will K i And obtaining the correction coefficient by taking the average value.
Further, the performing model parameter correction by the prediction model includes: taking deviation between the model output value and the actual value as an optimization target, and optimizing key parameters of the model by adopting a genetic algorithm based on historical data, wherein the optimization target is as follows:
the beneficial effects of the invention are as follows:
1. industrial data are acquired based on an industrial Internet platform, so that the problem of data island existing in chemical enterprises can be solved, and the data value of different systems is fully mined.
2. Based on the data-driven modeling method of the industrial Internet platform, the built-in machine learning algorithm library comprises dozens of mainstream algorithms, so that the model can be better adapted to frequent changes of working conditions.
3. The method provided by the invention can greatly reduce the requirements of factories on measuring equipment, and has important significance for improving the product quality, promoting energy conservation and consumption reduction and accelerating the digital transformation of enterprises.
4. The method provided by the invention can predict the key indexes of chemical raw materials and products in real time, avoids the problems of long time consumption, difficult detection or incapability of detection of certain indexes and the like, and saves a great amount of time and resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data driven quality modeling method based on an industrial Internet platform according to an embodiment of the invention;
FIG. 2 is a flow chart of a data-driven modeling business in a data-driven quality modeling method based on an industrial Internet platform according to an embodiment of the present invention;
FIG. 3 is a diagram of an industrial Internet platform technology architecture in a data-driven quality modeling method based on an industrial Internet platform according to an embodiment of the present invention;
fig. 4 is a configuration diagram of an industrial internet platform in a data-driven quality modeling method based on the industrial internet platform according to an embodiment of the present invention.
Detailed Description
For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used to illustrate the embodiments and, together with the description, serve to explain the principles of the embodiments, and with reference to these descriptions, one skilled in the art will recognize other possible implementations and advantages of the present invention, wherein elements are not drawn to scale, and like reference numerals are generally used to designate like elements.
According to an embodiment of the invention, a data-driven quality modeling method based on an industrial Internet platform is provided.
The invention will now be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1-4, a data-driven quality modeling method based on an industrial internet platform according to an embodiment of the invention, the method comprising the steps of:
s1, collecting different system data based on an industrial Internet platform, and uniformly summarizing the data;
specifically, the collected data comprise quality index data of a quality service system and real-time production data of a production system;
s2, carrying out data preprocessing on the acquired data;
s3, selecting auxiliary variables according to a process principle and process characteristics, and performing dimension reduction on the auxiliary variables by adopting a principal component analysis method;
specifically, the principal component analysis method is a dimension reduction method which is widely applied, and on the basis of retaining data information as much as possible, a variance-covariance structure of a group of variables is explained by replacing a plurality of random variables with a few mutually uncorrelated comprehensive factors and essentially a plurality of linear combinations of the group of variables. The weight of each main component is determined by the contribution rate of the main component and objectively determined by the information of the data, so that the defect that the subjective weighting method manually determines the weight is overcome;
s4, constructing a key product quality prediction model based on a data driving modeling strategy;
specifically, the data driving model is a process model based on a large amount of process data and a machine learning algorithm, and benefits from massive real-time process data and experimental analysis data brought by a chemical enterprise distributed control system and a laboratory information management system, so that the process model can be established by deep mining of the data through the machine learning algorithm. The data driving model needs fewer process mechanisms in the training stage, has the advantages of small calculated amount, high solving speed, high accuracy in the data range established by the model and the like in the using stage, achieves good effects in various process modeling tasks, and achieves wide attention of students;
s5, performing deviation correction and model parameter correction on the established prediction model.
In one embodiment, the data preprocessing of the collected data comprises the steps of:
s201, merging and storing the acquired data to obtain sample data;
s202, abnormal data elimination and filtering processing are carried out on the sample data, and data are normalized.
In one embodiment, the calculation formula for fusing the acquired data is as follows:
wherein h is 1q Indicating that the business system is at t 1q Data collected at the moment;
h 2q indicating that the production system is at t 2q Data collected at the moment;
ε h1 representing the acquired data h 1q Root mean square error of (a);
ε t1 indicating time t 1q Root mean square error of (a);
ε h2 representing the acquired data h 2q Root mean square error of (a);
ε t2 indicating time t 2q Root mean square error of (a);
h q representing the business system and the production system at t q And collecting the data fusion result at the moment.
In one embodiment, the sample data is subjected to abnormal data rejection and is subjected to screening processing by adopting a 3 sigma judgment principle, and the specific steps are as follows;
assuming that n auxiliary variables in the sample data are x, the sequence of x is x 1 ,x 2 ,…,x i (i=1, 2,3 … n), and the average value x and standard deviation σ thereof are calculated:
if the auxiliary variable x in the sample satisfies the following formula:
then the sample is removed as an abnormal sample, the 3 sigma judgment processing is sequentially carried out on other auxiliary variables in the sample, and the screened sample is selected into a modeling sample set;
in one embodiment, the sample data is filtered to average filter the samples by the following formula:
X(t)=(X(t-T/2)+X(t-T/2+T c )+…+X(t))
…+X(t-T/2+T c )+X(t+T/2)/(T/T c )
wherein t represents a sampling time;
t represents a filtering time constant;
T c representing the sampling period.
In one embodiment, the normalizing the data normalizes the sample data to [ y ] by the following formula min ,y max ]:
y=[y min ,y max ]*(x-x min )/(x max -x min )+y min
Wherein y is min ,y max Representing the upper and lower bounds of the normalized target;
x max ,x min representing the current variable value as upper and lower bounds.
In one embodiment, the principal component analysis is calculated as follows:
1) Normalizing the original sample data and forming a normalization matrix:
let m-dimensional random vector x= (X) 1 ,X 2 ,…,X n ) T For n samples X i =(X i1 ,X i2 ,…,X im ) T (i=1, 2,3 … m), T is the superscript of the matrix transpose, form the sample matrix, normalize the sample matrix, average the samples:
sample variance:
the normalized data are:
wherein, (i=1, 2,3 … m; k=1, 2,3 … n),
form a standardized matrix X (X ik );
2) Sample correlation coefficient matrix is calculated for standard price matrix:
wherein r is ij Elements representing row i, column j of matrix R, (i, j=1, 2,3 … m);
3) Determining the main components:
solving characteristic equation |R-lambda I of sample correlation matrix R m M eigenvectors are obtained by =0, wherein λ represents eigenvalues, I represents an identity matrix, and R is a symmetric matrix, eigenvalues are obtained by jacobian method, and the eigenvalues are obtained according toDetermining the value of p to make the information utilization rate up to above 85% to obtain p main components, for every lambda j (j=1, 2,3 … p) solve the equation set rb=λ j b Unit feature vector->b represents a feature vector set;
4) Converting the standardized index variable into a main component:
wherein U is 1 Called first principal component, U 2 Called second principal component, U m Called the m-th principal component;
5) And comprehensively evaluating the m main components, and carrying out weighted summation on the m main components to obtain a final evaluation value, wherein the weight is the variance contribution rate of each main component.
In one embodiment, the modeling strategy based on data driving builds a key product quality prediction model by adopting an algorithm in a machine learning algorithm library built in an industrial internet platform, and combining the preprocessed data for modeling
Specifically, the data driving model adopts dozens of mainstream algorithms in a machine learning algorithm library, such as an artificial neural network, a least square support vector machine and the like;
the artificial neural network is a mathematical model for performing distributed parallel information processing by simulating the behavior characteristics of the biological neural network. The network relies on the complexity of the system, and achieves the purpose of information processing by adjusting the relationship of interconnection among a large number of nodes. The artificial neural network has self-learning and self-adapting capabilities, can analyze the internal relation and rules of the two through a group of input and output data which are provided in advance and correspond to each other, and finally forms a complex nonlinear system function through the rules. Each input connection of the neuron has a synaptic connection strength, represented by a connection weight, through which the signal to be generated is amplified, each input quantity corresponding to an associated weight. The processing unit quantizes the weighted inputs, and then adds the weighted values to calculate the output.
In artificial neural networks, the ability and efficiency of the network to solve problems is largely dependent on the activation function employed by the network, in addition to the network architecture. The selection of the activation function has a great influence on the convergence speed of the network, and the selection of the activation function should be different for different practical problems. The usual activation functions are in the following forms:
threshold function:
wherein p represents a dependent variable of the threshold function;
x represents a dependent variable of a threshold function;
this function is also commonly referred to as a step function. When the step function is adopted as the activation function, the output of the neuron is 1 or 0 at the moment, and the excitation or inhibition of the neuron is reflected;
linear function: y=kx+b
Wherein y represents a dependent variable of a linear function;
x represents a dependent variable of a linear function;
k represents the slope of the linear function;
b represents the intercept of the linear function;
the function can be used as an activation function of the output neuron when the output result is any value;
logarithmic sigmoid function:
wherein x represents the dependent variable of the sigmoid function;
the output of the logarithmic S-shaped function is between 0 and 1, and is often required to be selected for outputting signals in the range of 0 to 1, which is the most widely used activation function in neurons;
hyperbolic tangent sigmoid function:
wherein x represents a dependent variable of a hyperbolic tangent sigmoid function;
the hyperbolic tangent sigmoid function is similar to a smoothed step function, has the same shape as a logarithmic sigmoid function, is symmetrical about the origin, has an output between-1 and 1, and is often required to be used for outputting signals in the range of-1 to 1.
The least square support vector machine algorithm changes inequality constraint in the traditional support vector machine into equality constraint, and takes the sum of squares of errors as a loss function of training, so that solving the quadratic programming problem in the support vector machine is converted into solving the linear equation set problem, and the solving speed is increased;
the LSSVM optimization problem can be described by the following system of equations:
wherein L represents a loss function;
omega represents a weight vector;
gamma represents an adjustable function;
e i representing an error vector;
x i representing input data;
y i representing output data;
representing a mapping function;
b represents a deviation vector;
t represents a transpose;
i represents the position of the data (i=1 to n);
n represents the total number of training data;
s.t represents a constraint abbreviation;
solving the optimization problem by adopting a Lagrangian method:
least squares supportThe expression form of the vector machine isThe invention adopts kernel function as radial basis kernel function, < ->Wherein k (x) i ,y i ) As a kernel function, a i Representing the lagrangian multiplier. e, e i Representing an error vector; n represents the total number of training data; i represents the position of the data (i=1 to n);
in one embodiment, the predictive model performing bias correction includes: in the running process of the model, new data are adopted to correct the model, and a deviation correction method is adopted to correct the model according to the model prediction error, wherein the calculation formula of the deviation correction method is as follows:
wherein,representing the output value of the model after correction at the current moment;
representing a predicted value output by the current time model;
k represents a correction coefficient;
y (t-1) andrepresenting the real value at the previous moment and the predicted value output by the model;
t represents a sampling time;
the correction coefficient is obtained by dividing the model error of the current period and the model error of the previous period:
wherein Y (t) i ) Representing data within a current time period;
representing an average value of the predicted values in the current period;
Y(t i -t) represents data in a previous period;
Y m (t i -t) represents the median value of the predictions in the previous period;
K=median(K i ) Will K i And obtaining the correction coefficient by taking the average value.
In one embodiment, the predictive model making model parameter corrections includes: taking deviation between the model output value and the actual value as an optimization target, and optimizing key parameters of the model by adopting a genetic algorithm based on historical data, wherein the optimization target is as follows:
genetic algorithms start the search process from a set of randomly generated initial solutions, called populations. Each individual in the population is a solution to the problem, called a chromosome. These chromosomes evolve continuously in subsequent iterations, called inheritance. The genetic algorithm is realized mainly through crossover, mutation and selection operation. Crossover or mutation operations generate the next generation of chromosomes, called offspring. Chromosome quality is measured by fitness. A certain number of individuals are selected from the previous generation and the next generation according to the fitness, and the individuals are used as the next generation group to continue to evolve, so that after a plurality of generations, the algorithm converges to the best chromosome, which is likely to be the optimal solution or suboptimal solution of the problem. The concept of fitness is used in genetic algorithms to measure how well optimal solutions are likely to be achieved in the calculation of the negligence of individual individuals in a population. The function that measures fitness of an individual is called a fitness function. The definition of fitness functions is generally related to a specific solution problem.
The main operation procedure of the genetic algorithm using three genetic operators (selection operator, crossover operator and mutation operator) is as follows:
a. initializing: setting an evolution algebra counter v=0; setting a maximum evolution algebra V; randomly generating H individuals as an initial population Q (0);
b. individual evaluation: calculating the fitness of individuals in the group Q (V);
c. selection operation: applying a selection operator to the population;
d. crossover operator: acting on the population;
e. and (3) mutation operation: acting mutation operators on the group, and obtaining a next generation group Q (v+1) after the group Q (v) is subjected to selection, crossing and mutation operation;
f. judging a termination condition: if V is less than or equal to V, then: v=v+1, go to step b; if V > V, taking the individual with the greatest fitness obtained in the evolution process as the optimal solution to output, and terminating the calculation.
In summary, by means of the technical scheme, the industrial data are collected based on the industrial internet platform, so that the problem of data island existing in chemical enterprises can be solved, and the data value of different systems can be fully mined; based on the data-driven modeling method of the industrial Internet platform, the built-in machine learning algorithm library comprises dozens of mainstream algorithms, so that the model can be better adapted to frequent changes of working conditions; the method provided by the invention can greatly reduce the requirements of factories on measuring equipment, and has important significance for improving the product quality, promoting energy conservation and consumption reduction and accelerating the digital transformation of enterprises; the method provided by the invention can predict the key indexes of chemical raw materials and products in real time, avoids the problems of long time consumption, difficult detection or incapability of detection of certain indexes and the like, and saves a great amount of time and resources.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.
Claims (8)
1. A data-driven quality modeling method based on an industrial internet platform, the method comprising the steps of:
s1, collecting different system data based on an industrial Internet platform, and uniformly summarizing the data;
s2, carrying out data preprocessing on the acquired data;
s3, selecting auxiliary variables according to a process principle and process characteristics, and performing dimension reduction on the auxiliary variables by adopting a principal component analysis method;
s4, constructing a key product quality prediction model based on a data driving modeling strategy;
s5, performing deviation correction and model parameter correction on the established prediction model;
the data preprocessing of the collected data comprises the following steps:
s201, merging and storing the acquired data to obtain sample data;
s202, carrying out abnormal data rejection and filtering processing on the sample data, and normalizing the data;
the calculation formula for fusing the acquired data is as follows:
;
wherein,indicating that the business system is->Data collected at the moment;
indicating that the production system is->Data collected at the moment;
representing acquisition data +.>Root mean square error of (a);
indicating time->Root mean square error of (a);
representing acquisition data +.>Root mean square error of (a);
indicating time->Root mean square error of (a);
indicating that business system and production system are in->And collecting the data fusion result at the moment.
2. The data-driven quality modeling method based on the industrial internet platform according to claim 1, wherein the sample data is subjected to abnormal data rejection and is subjected to screening processing by adopting a 3 sigma judgment principle, and the method comprises the following specific steps of;
assuming that the total n auxiliary variables in the sample data are x, the sequence of x isAnd calculate the average +.>And standard deviation sigma:
;
;
if the auxiliary variable x in the sample satisfies the following formula:
;
then the sample is removed as an abnormal sample, the 3 sigma judgment processing is sequentially carried out on other auxiliary variables in the sample, and the screened sample is selected into a modeling sample set.
3. The method of claim 1, wherein the filtering of the sample data provides for average filtering of the sample by the following equation:
;
wherein t represents a sampling time;
t represents a filtering time constant;
representing the sampling period.
4. The industrial internet platform-based data-driven quality modeling method of claim 1, wherein normalizing the data normalizes the sample data by the formula:
;
Wherein,representing the upper and lower bounds of the normalized target;
representing the current variable value as upper and lower bounds.
5. The method for modeling quality based on data driving of industrial internet platform according to claim 1, wherein the main component analysis method comprises the following steps:
1) Normalizing the original sample data and forming a normalization matrix:
let m-dimensional random vectorFor n samplesT is the superscript of matrix transposition, a sample matrix is formed, the sample matrix is standardized, and the average value of samples is calculated:
;
sample variance:
;
the normalized data are:
;
wherein,;
forming a standardized matrix;
2) Sample correlation coefficient matrix is calculated for standard price matrix:
;
wherein,elements representing row i, column j of matrix R,>;
3) Determining the main components:
solving characteristic equation of sample correlation matrix RObtain->Characteristic root of->The characteristic value is represented by a value of the characteristic,representing the identity matrix, due to->For symmetric matrix, the eigenvalues are obtained by jacobian method according toDetermining the value of p to make the information utilization rate reach above 85% to obtain p main components, for eachSolving the equation set->Get unit feature vector +.>B represents a feature vector set;
4) Converting the standardized index variable into a main component:
;
wherein U is 1 Called first principal component, U 2 Called second principal component, U m Called the m-th principal component;
5) And comprehensively evaluating the m main components, and carrying out weighted summation on the m main components to obtain a final evaluation value, wherein the weight is the variance contribution rate of each main component.
6. The method for modeling quality based on data driving of an industrial internet platform according to claim 1, wherein the method for modeling quality prediction model of the key product based on the data driving modeling strategy adopts an algorithm in a machine learning algorithm library built in the industrial internet platform, and models by combining the preprocessed data.
7. The method of claim 1, wherein the performing bias correction by the predictive model comprises: in the running process of the model, new data are adopted to correct the model, and a deviation correction method is adopted to correct the model according to the model prediction error, wherein the calculation formula of the deviation correction method is as follows:
;
wherein,representing the output value of the model after correction at the current moment;
representing a predicted value output by the current time model;
k represents a correction coefficient;
and->Representing the real value at the previous moment and the predicted value output by the model;
t represents a sampling time;
the correction coefficient is obtained by dividing the model error of the current period and the model error of the previous period:
;
wherein,representing data within a current time period;
representing an average value of the predicted values in the current period;
representing data during a previous period;
a median value representing the predicted values in the previous period;
will->And obtaining the correction coefficient by taking the average value.
8. The method of claim 1, wherein the predictive model for model parameter correction comprises: taking deviation between the model output value and the actual value as an optimization target, and optimizing key parameters of the model by adopting a genetic algorithm based on historical data, wherein the optimization target is as follows:
。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310408969.1A CN116523388B (en) | 2023-04-17 | 2023-04-17 | Data-driven quality modeling method based on industrial Internet platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310408969.1A CN116523388B (en) | 2023-04-17 | 2023-04-17 | Data-driven quality modeling method based on industrial Internet platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116523388A CN116523388A (en) | 2023-08-01 |
CN116523388B true CN116523388B (en) | 2023-11-10 |
Family
ID=87391371
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310408969.1A Active CN116523388B (en) | 2023-04-17 | 2023-04-17 | Data-driven quality modeling method based on industrial Internet platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116523388B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012027683A (en) * | 2010-07-23 | 2012-02-09 | Nippon Steel Corp | Quality prediction device, quality prediction method, program and computer readable recording medium |
CN108647481A (en) * | 2018-08-14 | 2018-10-12 | 华东理工大学 | A kind of rotary kiln burning zone temperature flexible measurement method |
CN109657411A (en) * | 2019-01-18 | 2019-04-19 | 华东理工大学 | A kind of solvent deasphalting unit modeling and optimization method based on data-driven |
CN110210687A (en) * | 2019-06-13 | 2019-09-06 | 中南大学 | A kind of Nonlinear Dynamic production process product quality prediction technique returned based on local weighted slow feature |
CN110405343A (en) * | 2019-08-15 | 2019-11-05 | 山东大学 | A kind of laser welding process parameter optimization method of the prediction model integrated based on Bagging and particle swarm optimization algorithm |
CN111291937A (en) * | 2020-02-25 | 2020-06-16 | 合肥学院 | Method for predicting quality of treated sewage based on combination of support vector classification and GRU neural network |
CN111428201A (en) * | 2020-03-27 | 2020-07-17 | 陕西师范大学 | Prediction method for time series data based on empirical mode decomposition and feedforward neural network |
WO2021063136A1 (en) * | 2019-09-30 | 2021-04-08 | 江苏大学 | Data-driven high-precision integrated navigation data fusion method |
CN113569993A (en) * | 2021-08-27 | 2021-10-29 | 浙江工业大学 | Method for constructing quality prediction model in polymerization reaction process |
CN114997486A (en) * | 2022-05-26 | 2022-09-02 | 南京工业大学 | Effluent residual chlorine prediction method of water works based on width learning network |
CN115545321A (en) * | 2022-10-14 | 2022-12-30 | 云南中烟工业有限责任公司 | On-line prediction method for process quality of silk making workshop |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105205224B (en) * | 2015-08-28 | 2018-10-30 | 江南大学 | Time difference Gaussian process based on fuzzy curve analysis returns soft-measuring modeling method |
EP3471027A1 (en) * | 2017-10-13 | 2019-04-17 | Siemens Aktiengesellschaft | A method for computer-implemented determination of a data-driven prediction model |
CN109960873B (en) * | 2019-03-24 | 2021-09-10 | 北京工业大学 | Soft measurement method for dioxin emission concentration in urban solid waste incineration process |
US20220147672A1 (en) * | 2019-05-17 | 2022-05-12 | Tata Consultancy Services Limited | Method and system for adaptive learning of models for manufacturing systems |
CN112625758B (en) * | 2019-09-24 | 2023-01-13 | 中国石油化工股份有限公司 | Intelligent gasification batching system and method |
CN113128793A (en) * | 2021-05-19 | 2021-07-16 | 中国南方电网有限责任公司 | Photovoltaic power combination prediction method and system based on multi-source data fusion |
-
2023
- 2023-04-17 CN CN202310408969.1A patent/CN116523388B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012027683A (en) * | 2010-07-23 | 2012-02-09 | Nippon Steel Corp | Quality prediction device, quality prediction method, program and computer readable recording medium |
CN108647481A (en) * | 2018-08-14 | 2018-10-12 | 华东理工大学 | A kind of rotary kiln burning zone temperature flexible measurement method |
CN109657411A (en) * | 2019-01-18 | 2019-04-19 | 华东理工大学 | A kind of solvent deasphalting unit modeling and optimization method based on data-driven |
CN110210687A (en) * | 2019-06-13 | 2019-09-06 | 中南大学 | A kind of Nonlinear Dynamic production process product quality prediction technique returned based on local weighted slow feature |
CN110405343A (en) * | 2019-08-15 | 2019-11-05 | 山东大学 | A kind of laser welding process parameter optimization method of the prediction model integrated based on Bagging and particle swarm optimization algorithm |
WO2021063136A1 (en) * | 2019-09-30 | 2021-04-08 | 江苏大学 | Data-driven high-precision integrated navigation data fusion method |
CN111291937A (en) * | 2020-02-25 | 2020-06-16 | 合肥学院 | Method for predicting quality of treated sewage based on combination of support vector classification and GRU neural network |
CN111428201A (en) * | 2020-03-27 | 2020-07-17 | 陕西师范大学 | Prediction method for time series data based on empirical mode decomposition and feedforward neural network |
CN113569993A (en) * | 2021-08-27 | 2021-10-29 | 浙江工业大学 | Method for constructing quality prediction model in polymerization reaction process |
CN114997486A (en) * | 2022-05-26 | 2022-09-02 | 南京工业大学 | Effluent residual chlorine prediction method of water works based on width learning network |
CN115545321A (en) * | 2022-10-14 | 2022-12-30 | 云南中烟工业有限责任公司 | On-line prediction method for process quality of silk making workshop |
Non-Patent Citations (3)
Title |
---|
数据驱动技术在石化工业运行中的应用;冯大春;鲁红;;石油化工自动化(06);全文 * |
数据驱动的复杂产品质量预测与质量规则挖掘方法研究;信息科技;《信息科技》;全文 * |
集成即时学习软测量建模方法研究;李建刚;《工程科技Ⅰ辑》(第06期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116523388A (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11650968B2 (en) | Systems and methods for predictive early stopping in neural network training | |
CN113255848B (en) | Water turbine cavitation sound signal identification method based on big data learning | |
CN111126575A (en) | Gas sensor array mixed gas detection method and device based on machine learning | |
CN110571792A (en) | Analysis and evaluation method and system for operation state of power grid regulation and control system | |
CN109409425B (en) | Fault type identification method based on neighbor component analysis | |
CN111723523B (en) | Estuary surplus water level prediction method based on cascade neural network | |
CN103675011B (en) | The industrial melt index soft measurement instrument of optimum support vector machine and method | |
Monroy et al. | Fault diagnosis of a benchmark fermentation process: a comparative study of feature extraction and classification techniques | |
CN114764682B (en) | Rice safety risk assessment method based on multi-machine learning algorithm fusion | |
CN103675006B (en) | The industrial melt index soft measurement instrument of least square and method | |
CN116109039A (en) | Data-driven anomaly detection and early warning system | |
CN111832703A (en) | Sampling interval perception long-short term memory network-based process manufacturing industry irregular sampling dynamic sequence modeling method | |
CN116662925A (en) | Industrial process soft measurement method based on weighted sparse neural network | |
US20140297573A1 (en) | Method for quantifying amplitude of a response of a biological network | |
CN108490782A (en) | A kind of method and system being suitable for complex industrial process product quality indicator missing data completion based on selective double layer integrated study | |
CN113151842B (en) | Method and device for determining conversion efficiency of wind-solar complementary water electrolysis hydrogen production | |
CN116523388B (en) | Data-driven quality modeling method based on industrial Internet platform | |
CN103675012B (en) | The industrial melt index soft measurement instrument of BP particle group optimizing and method | |
CN103675009B (en) | The industrial melt index soft measurement instrument of fuzzifying equation and method | |
CN117436029A (en) | Method and system for serial collaborative fusion of multiple large models | |
CN114330089B (en) | Rare earth element content change prediction method and system | |
CN115275977A (en) | Power load prediction method and device | |
CN115481715A (en) | Product quality index prediction method and system based on AM-GRU-BPNN | |
Lajoie et al. | A data-driven framework to deal with intrinsic variability of industrial processes: An application in the textile industry | |
CN112184037A (en) | Multi-modal process fault detection method based on weighted SVDD |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |