CN116523388B

CN116523388B - Data-driven quality modeling method based on industrial Internet platform

Info

Publication number: CN116523388B
Application number: CN202310408969.1A
Authority: CN
Inventors: 王峰; 顾毅; 熊亮; 张莹; 郑锦泉
Original assignee: Wuxi Xuelang Shuzhi Technology Co ltd
Current assignee: Wuxi Xuelang Shuzhi Technology Co ltd
Priority date: 2023-04-17
Filing date: 2023-04-17
Publication date: 2023-11-10
Anticipated expiration: 2043-04-17
Also published as: CN116523388A

Abstract

The invention discloses a data-driven quality modeling method based on an industrial Internet platform, which comprises the following steps: different system data are collected based on an industrial Internet platform, and unified summarization is carried out on the data; carrying out data preprocessing on the acquired data; selecting auxiliary variables according to the process principle and the process characteristics, and adopting a principal component analysis method to reduce the dimension of the auxiliary variables; constructing a key product quality prediction model based on a data-driven modeling strategy; and carrying out deviation correction and model parameter correction on the established prediction model. The method provided by the invention can greatly reduce the requirements of factories on measuring equipment, and has important significance for improving the product quality, promoting energy conservation and consumption reduction and accelerating the digital transformation of enterprises; the method provided by the invention can predict the key indexes of chemical raw materials and products in real time, avoids the problems of long time consumption, difficult detection or incapability of detection of certain indexes and the like, and saves a great amount of time and resources.

Description

Data-driven quality modeling method based on industrial Internet platform

Technical Field

The invention relates to the field of industrial Internet, in particular to a data-driven quality modeling method based on an industrial Internet platform.

Background

In petrochemical processes, the simulation, control and optimization of the system often rely on high performance models. With the increasing market competition and the increasing environmental protection requirements in recent years, enterprises are urgently required to improve economic benefits as much as possible from effective resources, so that new requirements are put on process control and optimization, and modeling difficulty is also increased, particularly modeling of biological parameters in modeling fermentation processes of strong nonlinearity and time-varying objects such as physical and chemical parameters in chemical processes under a continuous stirring reaction kettle. For example, continuous Stirred Tank Reactors (CSTRs) are a widely used type of reactor in polymerization chemistry, which not only plays a significant role in the core equipment of chemical production but also is commonly used in the dye, pharmaceutical reagents, food and synthetic materials industries. However, on the contrary, the reason why the automatic control of the reaction process therein has been slow is mainly that the process modeling thereof has been made very difficult by the fact that the reaction process therein often involves a lot of physical and chemical interactions and influences that make the reaction process exhibit a high degree of nonlinearity.

In the actual production operation of chemical industry, due to the lack of technical means and hardware equipment, a core production system cannot feed back all required process parameters in real time, and if the reaction process needs to be better controlled, data information in the reaction process needs to be obtained. Compared with the variables such as temperature, pressure, liquid level, volume and the like which are relatively easy to measure in real time, the parameters such as reactant concentration and the like lack reliable sensors to detect the variables on line, and the cost is high. Many industrial production systems fail to rely on fault diagnosis and status detection to improve the safety of the system operation. This also brings great trouble to the quality of the product. Other factors, such as temperature, concentration of the feedstock within the reactor, etc., may also be affected during the production process, which may result in an uncertainty in the modeled type.

The arrival of big data and industrial Internet age opens up a new method for the intellectualization of the chemical industry field by algorithm research represented by mathematical mining and machine learning technologies, and indicates a new direction. The data-driven quality modeling method based on the industrial Internet platform has higher flexibility and reality correlation, and can fully mine important information in historical data by utilizing strong learning and characterization capabilities of the data-driven quality modeling method, and an accurate prediction model is established for key raw materials and product quality indexes.

For the problems in the related art, no effective solution has been proposed at present.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides a data-driven quality modeling method based on an industrial Internet platform, so as to overcome the technical problems in the prior related art.

For this purpose, the invention adopts the following specific technical scheme:

a data-driven quality modeling method based on an industrial internet platform, the method comprising the steps of:

s1, collecting different system data based on an industrial Internet platform, and uniformly summarizing the data;

s2, carrying out data preprocessing on the acquired data;

s3, selecting auxiliary variables according to a process principle and process characteristics, and performing dimension reduction on the auxiliary variables by adopting a principal component analysis method;

s4, constructing a key product quality prediction model based on a data driving modeling strategy;

s5, performing deviation correction and model parameter correction on the established prediction model.

Further, the data preprocessing of the collected data includes the following steps:

s201, merging and storing the acquired data to obtain sample data;

s202, abnormal data elimination and filtering processing are carried out on the sample data, and data are normalized.

Further, the calculation formula for fusing the collected data is as follows:

wherein h is _1q Indicating that the business system is at t _1q Data collected at the moment;

h _2q indicating that the production system is at t _2q Data collected at the moment;

ε _h1 representing the acquired data h _1q Root mean square error of (a);

ε _t1 indicating time t _1q Root mean square error of (a);

ε _h2 representing the acquired data h _2q Root mean square error of (a);

ε _t2 indicating time t _2q Root mean square error of (a);

h _q representing the business system and the production system at t _q And collecting the data fusion result at the moment.

Furthermore, the sample data is subjected to abnormal data rejection and is subjected to screening treatment by adopting a 3 sigma judgment principle, and the specific steps are as follows;

assuming that n auxiliary variables in the sample data are x, the sequence of x is x ₁ ,x ₂ ,…,x _i (i=1, 2,3 … n), and the average value x and standard deviation σ thereof are calculated:

if the auxiliary variable x in the sample satisfies the following formula:

then the sample is removed as an abnormal sample, the 3 sigma judgment processing is sequentially carried out on other auxiliary variables in the sample, and the screened sample is selected into a modeling sample set;

further, the sample data is filtered to obtain a sample by the following formula

And (3) carrying out average filtering:

X(t)＝(X(t-T/2)+X(t-T/2+T _c )+…+X(t))

…+X(t-T/2+T _c )+X(t+T/2)/(T/T _c )

wherein t represents a sampling time;

t represents a filtering time constant;

T _c representing the sampling period.

Further, the normalizing the data normalizes the sample data to [ y ] by the following formula _min ,y _max ]：

y＝[y _min ,y _max ]*(x-x _min )/(x _max -x _min )+y _min

Wherein y is _min ,y _max Representing the upper and lower bounds of the normalized target;

x _max ,x _min representing the current variable value as upper and lower bounds.

Further, the main component analysis method comprises the following calculation steps:

1) Normalizing the original sample data and forming a normalization matrix:

let m-dimensional random vector x= (X) ₁ ，X ₂ ，…，X _n ) ^T For n samples X _i ＝(X _i1 ，X _i2 ，…，X _im ) ^T (i=1, 2,3 … m), T is the superscript of the matrix transpose, form the sample matrix, normalize the sample matrix, average the samples:

sample variance:

the normalized data are:

wherein, (i=1, 2,3 … m; k=1, 2,3 … n),

form a standardized matrix X (X _ik )；

2) Sample correlation coefficient matrix is calculated for standard price matrix:

wherein r is _ij Elements representing row i, column j of matrix R, (i, j=1, 2,3 … m);

3) Determining the main components:

solving characteristic equation |R-lambda I of sample correlation matrix R _m M eigenvectors are obtained by =0, wherein λ represents eigenvalues, I represents an identity matrix, and R is a symmetric matrix, eigenvalues are obtained by jacobian method, and the eigenvalues are obtained according toDetermining the value of p to make the information utilization rate up to above 85% to obtain p main components, for every lambda _j (j=1, 2,3 … p) solve the equation set rb=λ _j b Unit feature vector->b represents a feature vector set;

4) Converting the standardized index variable into a main component:

wherein U is ₁ Called first principal component, U ₂ Called second principal component, U _m Called the m-th principal component;

5) And comprehensively evaluating the m main components, and carrying out weighted summation on the m main components to obtain a final evaluation value, wherein the weight is the variance contribution rate of each main component.

Furthermore, the key product quality prediction model constructed based on the data-driven modeling strategy adopts an algorithm in a machine learning algorithm library built in an industrial Internet platform, and models by combining the preprocessed data.

Further, the performing offset correction by the prediction model includes: in the running process of the model, new data are adopted to correct the model, and a deviation correction method is adopted to correct the model according to the model prediction error, wherein the calculation formula of the deviation correction method is as follows:

wherein,representing the output value of the model after correction at the current moment;

representing a predicted value output by the current time model;

k represents a correction coefficient;

y (t-1) andrepresenting the real value at the previous moment and the predicted value output by the model;

t represents a sampling time;

the correction coefficient is obtained by dividing the model error of the current period and the model error of the previous period:

wherein Y (t) _i ) Representing data within a current time period;

representing an average value of the predicted values in the current period;

Y(t _i -t) represents data in a previous period;

Y _m (t _i -t) represents the median value of the predictions in the previous period;

K＝median(K _i ) Will K _i And obtaining the correction coefficient by taking the average value.

Further, the performing model parameter correction by the prediction model includes: taking deviation between the model output value and the actual value as an optimization target, and optimizing key parameters of the model by adopting a genetic algorithm based on historical data, wherein the optimization target is as follows:

the beneficial effects of the invention are as follows:

1. industrial data are acquired based on an industrial Internet platform, so that the problem of data island existing in chemical enterprises can be solved, and the data value of different systems is fully mined.

2. Based on the data-driven modeling method of the industrial Internet platform, the built-in machine learning algorithm library comprises dozens of mainstream algorithms, so that the model can be better adapted to frequent changes of working conditions.

3. The method provided by the invention can greatly reduce the requirements of factories on measuring equipment, and has important significance for improving the product quality, promoting energy conservation and consumption reduction and accelerating the digital transformation of enterprises.

4. The method provided by the invention can predict the key indexes of chemical raw materials and products in real time, avoids the problems of long time consumption, difficult detection or incapability of detection of certain indexes and the like, and saves a great amount of time and resources.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data driven quality modeling method based on an industrial Internet platform according to an embodiment of the invention;

FIG. 2 is a flow chart of a data-driven modeling business in a data-driven quality modeling method based on an industrial Internet platform according to an embodiment of the present invention;

FIG. 3 is a diagram of an industrial Internet platform technology architecture in a data-driven quality modeling method based on an industrial Internet platform according to an embodiment of the present invention;

fig. 4 is a configuration diagram of an industrial internet platform in a data-driven quality modeling method based on the industrial internet platform according to an embodiment of the present invention.

Detailed Description

For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used to illustrate the embodiments and, together with the description, serve to explain the principles of the embodiments, and with reference to these descriptions, one skilled in the art will recognize other possible implementations and advantages of the present invention, wherein elements are not drawn to scale, and like reference numerals are generally used to designate like elements.

According to an embodiment of the invention, a data-driven quality modeling method based on an industrial Internet platform is provided.

The invention will now be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1-4, a data-driven quality modeling method based on an industrial internet platform according to an embodiment of the invention, the method comprising the steps of:

specifically, the collected data comprise quality index data of a quality service system and real-time production data of a production system;

s2, carrying out data preprocessing on the acquired data;

specifically, the principal component analysis method is a dimension reduction method which is widely applied, and on the basis of retaining data information as much as possible, a variance-covariance structure of a group of variables is explained by replacing a plurality of random variables with a few mutually uncorrelated comprehensive factors and essentially a plurality of linear combinations of the group of variables. The weight of each main component is determined by the contribution rate of the main component and objectively determined by the information of the data, so that the defect that the subjective weighting method manually determines the weight is overcome;

specifically, the data driving model is a process model based on a large amount of process data and a machine learning algorithm, and benefits from massive real-time process data and experimental analysis data brought by a chemical enterprise distributed control system and a laboratory information management system, so that the process model can be established by deep mining of the data through the machine learning algorithm. The data driving model needs fewer process mechanisms in the training stage, has the advantages of small calculated amount, high solving speed, high accuracy in the data range established by the model and the like in the using stage, achieves good effects in various process modeling tasks, and achieves wide attention of students;

In one embodiment, the data preprocessing of the collected data comprises the steps of:

s201, merging and storing the acquired data to obtain sample data;

In one embodiment, the calculation formula for fusing the acquired data is as follows:

ε _h1 representing the acquired data h _1q Root mean square error of (a);

ε _t1 indicating time t _1q Root mean square error of (a);

ε _h2 representing the acquired data h _2q Root mean square error of (a);

ε _t2 indicating time t _2q Root mean square error of (a);

In one embodiment, the sample data is subjected to abnormal data rejection and is subjected to screening processing by adopting a 3 sigma judgment principle, and the specific steps are as follows;

if the auxiliary variable x in the sample satisfies the following formula:

in one embodiment, the sample data is filtered to average filter the samples by the following formula:

X(t)＝(X(t-T/2)+X(t-T/2+T _c )+…+X(t))

…+X(t-T/2+T _c )+X(t+T/2)/(T/T _c )

wherein t represents a sampling time;

t represents a filtering time constant;

T _c representing the sampling period.

In one embodiment, the normalizing the data normalizes the sample data to [ y ] by the following formula _min ,y _max ]：

y＝[y _min ,y _max ]*(x-x _min )/(x _max -x _min )+y _min

In one embodiment, the principal component analysis is calculated as follows:

1) Normalizing the original sample data and forming a normalization matrix:

sample variance:

the normalized data are:

wherein, (i=1, 2,3 … m; k=1, 2,3 … n),

form a standardized matrix X (X _ik )；

3) Determining the main components:

4) Converting the standardized index variable into a main component:

In one embodiment, the modeling strategy based on data driving builds a key product quality prediction model by adopting an algorithm in a machine learning algorithm library built in an industrial internet platform, and combining the preprocessed data for modeling

Specifically, the data driving model adopts dozens of mainstream algorithms in a machine learning algorithm library, such as an artificial neural network, a least square support vector machine and the like;

the artificial neural network is a mathematical model for performing distributed parallel information processing by simulating the behavior characteristics of the biological neural network. The network relies on the complexity of the system, and achieves the purpose of information processing by adjusting the relationship of interconnection among a large number of nodes. The artificial neural network has self-learning and self-adapting capabilities, can analyze the internal relation and rules of the two through a group of input and output data which are provided in advance and correspond to each other, and finally forms a complex nonlinear system function through the rules. Each input connection of the neuron has a synaptic connection strength, represented by a connection weight, through which the signal to be generated is amplified, each input quantity corresponding to an associated weight. The processing unit quantizes the weighted inputs, and then adds the weighted values to calculate the output.

In artificial neural networks, the ability and efficiency of the network to solve problems is largely dependent on the activation function employed by the network, in addition to the network architecture. The selection of the activation function has a great influence on the convergence speed of the network, and the selection of the activation function should be different for different practical problems. The usual activation functions are in the following forms:

threshold function:

wherein p represents a dependent variable of the threshold function;

x represents a dependent variable of a threshold function;

this function is also commonly referred to as a step function. When the step function is adopted as the activation function, the output of the neuron is 1 or 0 at the moment, and the excitation or inhibition of the neuron is reflected;

linear function: y=kx+b

Wherein y represents a dependent variable of a linear function;

x represents a dependent variable of a linear function;

k represents the slope of the linear function;

b represents the intercept of the linear function;

the function can be used as an activation function of the output neuron when the output result is any value;

logarithmic sigmoid function:

wherein x represents the dependent variable of the sigmoid function;

the output of the logarithmic S-shaped function is between 0 and 1, and is often required to be selected for outputting signals in the range of 0 to 1, which is the most widely used activation function in neurons;

hyperbolic tangent sigmoid function:

wherein x represents a dependent variable of a hyperbolic tangent sigmoid function;

the hyperbolic tangent sigmoid function is similar to a smoothed step function, has the same shape as a logarithmic sigmoid function, is symmetrical about the origin, has an output between-1 and 1, and is often required to be used for outputting signals in the range of-1 to 1.

The least square support vector machine algorithm changes inequality constraint in the traditional support vector machine into equality constraint, and takes the sum of squares of errors as a loss function of training, so that solving the quadratic programming problem in the support vector machine is converted into solving the linear equation set problem, and the solving speed is increased;

the LSSVM optimization problem can be described by the following system of equations:

wherein L represents a loss function;

omega represents a weight vector;

gamma represents an adjustable function;

e _i representing an error vector;

x _i representing input data;

y _i representing output data;

representing a mapping function;

b represents a deviation vector;

t represents a transpose;

i represents the position of the data (i=1 to n);

n represents the total number of training data;

s.t represents a constraint abbreviation;

solving the optimization problem by adopting a Lagrangian method:

least squares supportThe expression form of the vector machine isThe invention adopts kernel function as radial basis kernel function, < ->Wherein k (x) _i ,y _i ) As a kernel function, a _i Representing the lagrangian multiplier. e, e _i Representing an error vector; n represents the total number of training data; i represents the position of the data (i=1 to n);

in one embodiment, the predictive model performing bias correction includes: in the running process of the model, new data are adopted to correct the model, and a deviation correction method is adopted to correct the model according to the model prediction error, wherein the calculation formula of the deviation correction method is as follows:

representing a predicted value output by the current time model;

k represents a correction coefficient;

t represents a sampling time;

wherein Y (t) _i ) Representing data within a current time period;

representing an average value of the predicted values in the current period;

Y(t _i -t) represents data in a previous period;

In one embodiment, the predictive model making model parameter corrections includes: taking deviation between the model output value and the actual value as an optimization target, and optimizing key parameters of the model by adopting a genetic algorithm based on historical data, wherein the optimization target is as follows:

genetic algorithms start the search process from a set of randomly generated initial solutions, called populations. Each individual in the population is a solution to the problem, called a chromosome. These chromosomes evolve continuously in subsequent iterations, called inheritance. The genetic algorithm is realized mainly through crossover, mutation and selection operation. Crossover or mutation operations generate the next generation of chromosomes, called offspring. Chromosome quality is measured by fitness. A certain number of individuals are selected from the previous generation and the next generation according to the fitness, and the individuals are used as the next generation group to continue to evolve, so that after a plurality of generations, the algorithm converges to the best chromosome, which is likely to be the optimal solution or suboptimal solution of the problem. The concept of fitness is used in genetic algorithms to measure how well optimal solutions are likely to be achieved in the calculation of the negligence of individual individuals in a population. The function that measures fitness of an individual is called a fitness function. The definition of fitness functions is generally related to a specific solution problem.

The main operation procedure of the genetic algorithm using three genetic operators (selection operator, crossover operator and mutation operator) is as follows:

a. initializing: setting an evolution algebra counter v=0; setting a maximum evolution algebra V; randomly generating H individuals as an initial population Q (0);

b. individual evaluation: calculating the fitness of individuals in the group Q (V);

c. selection operation: applying a selection operator to the population;

d. crossover operator: acting on the population;

e. and (3) mutation operation: acting mutation operators on the group, and obtaining a next generation group Q (v+1) after the group Q (v) is subjected to selection, crossing and mutation operation;

f. judging a termination condition: if V is less than or equal to V, then: v=v+1, go to step b; if V > V, taking the individual with the greatest fitness obtained in the evolution process as the optimal solution to output, and terminating the calculation.

In summary, by means of the technical scheme, the industrial data are collected based on the industrial internet platform, so that the problem of data island existing in chemical enterprises can be solved, and the data value of different systems can be fully mined; based on the data-driven modeling method of the industrial Internet platform, the built-in machine learning algorithm library comprises dozens of mainstream algorithms, so that the model can be better adapted to frequent changes of working conditions; the method provided by the invention can greatly reduce the requirements of factories on measuring equipment, and has important significance for improving the product quality, promoting energy conservation and consumption reduction and accelerating the digital transformation of enterprises; the method provided by the invention can predict the key indexes of chemical raw materials and products in real time, avoids the problems of long time consumption, difficult detection or incapability of detection of certain indexes and the like, and saves a great amount of time and resources.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A data-driven quality modeling method based on an industrial internet platform, the method comprising the steps of:

s2, carrying out data preprocessing on the acquired data;

s5, performing deviation correction and model parameter correction on the established prediction model;

the data preprocessing of the collected data comprises the following steps:

s201, merging and storing the acquired data to obtain sample data;

s202, carrying out abnormal data rejection and filtering processing on the sample data, and normalizing the data;

the calculation formula for fusing the acquired data is as follows:

；

wherein,indicating that the business system is->Data collected at the moment;

indicating that the production system is->Data collected at the moment;

representing acquisition data +.>Root mean square error of (a);

indicating time->Root mean square error of (a);

representing acquisition data +.>Root mean square error of (a);

indicating time->Root mean square error of (a);

indicating that business system and production system are in->And collecting the data fusion result at the moment.

2. The data-driven quality modeling method based on the industrial internet platform according to claim 1, wherein the sample data is subjected to abnormal data rejection and is subjected to screening processing by adopting a 3 sigma judgment principle, and the method comprises the following specific steps of;

assuming that the total n auxiliary variables in the sample data are x, the sequence of x isAnd calculate the average +.>And standard deviation sigma:

；

if the auxiliary variable x in the sample satisfies the following formula:

；

then the sample is removed as an abnormal sample, the 3 sigma judgment processing is sequentially carried out on other auxiliary variables in the sample, and the screened sample is selected into a modeling sample set.

3. The method of claim 1, wherein the filtering of the sample data provides for average filtering of the sample by the following equation:

；

wherein t represents a sampling time;

t represents a filtering time constant;

representing the sampling period.

4. The industrial internet platform-based data-driven quality modeling method of claim 1, wherein normalizing the data normalizes the sample data by the formula：

；

Wherein,representing the upper and lower bounds of the normalized target;

representing the current variable value as upper and lower bounds.

5. The method for modeling quality based on data driving of industrial internet platform according to claim 1, wherein the main component analysis method comprises the following steps:

1) Normalizing the original sample data and forming a normalization matrix:

let m-dimensional random vectorFor n samplesT is the superscript of matrix transposition, a sample matrix is formed, the sample matrix is standardized, and the average value of samples is calculated:

；

sample variance:

；

the normalized data are:

；

wherein,；

forming a standardized matrix；

；

wherein,elements representing row i, column j of matrix R,>；

3) Determining the main components:

solving characteristic equation of sample correlation matrix RObtain->Characteristic root of->The characteristic value is represented by a value of the characteristic,representing the identity matrix, due to->For symmetric matrix, the eigenvalues are obtained by jacobian method according toDetermining the value of p to make the information utilization rate reach above 85% to obtain p main components, for eachSolving the equation set->Get unit feature vector +.>B represents a feature vector set;

4) Converting the standardized index variable into a main component:

；

6. The method for modeling quality based on data driving of an industrial internet platform according to claim 1, wherein the method for modeling quality prediction model of the key product based on the data driving modeling strategy adopts an algorithm in a machine learning algorithm library built in the industrial internet platform, and models by combining the preprocessed data.

7. The method of claim 1, wherein the performing bias correction by the predictive model comprises: in the running process of the model, new data are adopted to correct the model, and a deviation correction method is adopted to correct the model according to the model prediction error, wherein the calculation formula of the deviation correction method is as follows:

；

representing a predicted value output by the current time model;

k represents a correction coefficient;

and->Representing the real value at the previous moment and the predicted value output by the model;

t represents a sampling time;

；

wherein,representing data within a current time period;

representing an average value of the predicted values in the current period;

representing data during a previous period;

a median value representing the predicted values in the previous period;

will->And obtaining the correction coefficient by taking the average value.

8. The method of claim 1, wherein the predictive model for model parameter correction comprises: taking deviation between the model output value and the actual value as an optimization target, and optimizing key parameters of the model by adopting a genetic algorithm based on historical data, wherein the optimization target is as follows:

。