CN117973852A - Chemical safety assessment method based on improved radial basis function Bayesian network - Google Patents
Chemical safety assessment method based on improved radial basis function Bayesian network Download PDFInfo
- Publication number
- CN117973852A CN117973852A CN202410091657.7A CN202410091657A CN117973852A CN 117973852 A CN117973852 A CN 117973852A CN 202410091657 A CN202410091657 A CN 202410091657A CN 117973852 A CN117973852 A CN 117973852A
- Authority
- CN
- China
- Prior art keywords
- data
- model
- weight
- feature
- radial basis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 231100000778 chemical safety assessment Toxicity 0.000 title claims abstract description 15
- 239000011159 matrix material Substances 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000011156 evaluation Methods 0.000 claims abstract description 31
- 238000012545 processing Methods 0.000 claims abstract description 27
- 239000000126 substance Substances 0.000 claims abstract description 27
- 238000012360 testing method Methods 0.000 claims abstract description 21
- 238000012502 risk assessment Methods 0.000 claims abstract description 14
- 230000004913 activation Effects 0.000 claims abstract description 12
- 238000003860 storage Methods 0.000 claims description 25
- 238000000513 principal component analysis Methods 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 19
- 210000002569 neuron Anatomy 0.000 claims description 18
- 230000007613 environmental effect Effects 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000004140 cleaning Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 39
- 230000008859 change Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- QGZKDVFQNNGYKY-UHFFFAOYSA-N Ammonia Chemical compound N QGZKDVFQNNGYKY-UHFFFAOYSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 229910021529 ammonia Inorganic materials 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000011157 data evaluation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000383 hazardous chemical Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/02—Computing arrangements based on specific mathematical models using fuzzy logic
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Probability & Statistics with Applications (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Development Economics (AREA)
- Computational Mathematics (AREA)
- Educational Administration (AREA)
- Automation & Control Theory (AREA)
- Game Theory and Decision Science (AREA)
- Mathematical Optimization (AREA)
- Marketing (AREA)
- Operations Research (AREA)
Abstract
The invention discloses a chemical safety assessment method based on an improved radial basis function Bayesian network, which comprises the steps of obtaining a public data set, processing and dividing a training set and a testing set; combining the PCA algorithm with an entropy weight method, carrying out weight analysis processing on the data, and obtaining a weight matrix; constructing a Bayesian network model based on a radial basis function, replacing the weight of the model with a weight matrix after processing, and constructing the model by taking the radial basis function as an activation function of an implicit layer; defining a fuzzy set and a membership function by combining a fuzzy logic algorithm, defining a fuzzy control system and an evaluation simulator according to a data set, and judging risk evaluation; and constructing an improved radial basis function Bayesian network model, training the model and predicting data, and performing risk assessment on the predicted data. The method improves the correlation between data, effectively monitors potential dangerous factors, and greatly improves the safety of a chemical industry park.
Description
Technical field:
The invention belongs to the field of chemical safety evaluation, and particularly relates to a chemical safety evaluation method based on an improved radial basis kernel Bayesian network.
Background
Chemical safety assessment systems are a central industrial safety tool, the main purpose of which is to assist chemical enterprises in assessing and managing their potential safety risks. The safety problem is particularly important because the chemical industry involves complex chemical reactions and large amounts of hazardous substances. Previously, many security assessment systems have relied on a large number of manual inputs and operations. These systems require manual input of equipment parameters, operating conditions, etc. by specialized chemical engineers and complex calculations to arrive at risk assessment results. This way of relying on manual operations is not only inefficient, but may lead to inaccuracy in the evaluation result due to human error. In recent years, with the rise of artificial intelligence, deep learning and machine learning are widely used in the field of security assessment, and are continuously perfected.
For the field of chemical safety evaluation, bayesian networks and graph theory methods are widely used, and the algorithm is often applied to the fields of prediction and classification due to high algorithm interpretation and high efficiency. For the field of chemical safety evaluation, correlation processing is required according to environmental parameters of a chemical industry park, and although a Bayesian network is often applied to the field, the accuracy of a model is not high due to high data requirements and high calculation complexity, so that a multi-core Bayesian network method can be adopted to overcome the problems. The particular feature in which a single core is replaced by a multi-core is the introduction of parallel computing. The multi-core Bayesian network can simultaneously execute a plurality of calculation tasks and distribute the workload to a plurality of cores or processors, thereby improving the calculation efficiency and speed and further improving the accuracy of the model. However, the adoption of the multi-core bayesian network also faces some technical problems, such as the problem that a plurality of cores perform parallel computation, so that the complexity of a model is increased. For the problems, the PCA algorithm and the entropy weight method are adopted to reduce the dimension of data, and the calculation complexity is reduced. The PCA can convert data into fewer main components, so that the data processing and calculation are more efficient, the parallel calculation capacity of the model can be greatly improved, and the accuracy of the model is further improved.
The invention aims to: aiming at the problems pointed out in the background art, the invention provides a chemical safety assessment method based on an improved radial basis kernel Bayesian network, which aims to solve the problems of low accuracy and instability of the existing chemical risk assessment system.
The technical scheme is as follows: the invention provides a chemical safety assessment method based on an improved radial basis function Bayesian network, which comprises the following steps:
S1, carrying out environmental monitoring on a chemical industry park storage tank through a sensor, acquiring temperature, humidity, gas concentration and air pressure related environmental parameters in the storage tank, summarizing the environmental parameters to be used as a public data set, then carrying out data cleaning on the public data set, carrying out data normalization processing, and dividing a training set and a testing set to be used as input of a subsequent model;
s2, combining a PCA algorithm with an entropy weight method, carrying out weight analysis processing on environmental data, and carrying out feature classification by combining the mutual influence among all environmental factors in the storage tank so as to obtain a weight matrix, and using the weight matrix as a subsequent use;
S3, constructing a radial basis function Bayesian network model, setting model parameters, replacing the weight of the model with a weight matrix after processing, performing data processing by taking a radial basis function as an activation function of an implicit layer, and transmitting a result to an output layer to serve as the output of the model;
S4, defining a fuzzy set and a membership function by combining a fuzzy logic algorithm, and defining a fuzzy control system and an evaluation simulator by combining the safety range of each feature of the data set to be used as a rule for risk evaluation judgment;
S5, calling the improved radial basis function Bayesian network model in S3, training the model and predicting data, performing risk assessment on the predicted data through an assessment system to obtain risk levels of the predicted data, judging the risk levels, and further judging the safety of the environment in advance.
Further, the specific steps of S1 are as follows:
S1.1, acquiring internal environment data of a storage tank in a chemical park, and storing the data in chemical_data.csv, wherein a data set comprises temperature, humidity, gas concentration and gas pressure parameters;
S1.2, reading data from chemical_data.csv, analyzing the data by using '\t' as a separator, and setting random seeds to ensure the reproducibility of experiments;
S1.3, dividing the data set into a training set and a testing set by utilizing the index of the data set, and then carrying out data normalization processing by using MinMaxScaler types in a sklearn library;
s1.4, writing a format conversion function, wherein the function uses NumPy library to convert the training set and the testing set into a three-dimensional array format accepted by the model.
Further, in the step S2, the PCA algorithm is combined with the entropy weighting method, and the specific steps are as follows:
S2.1, performing Principal Component Analysis (PCA), performing centering treatment on an initial weight matrix T to obtain a centering feature matrix T center, then calculating a feature covariance matrix C= (1/m) T center ^T*Tcenter, wherein T represents transposition operation, m is the number of samples, and finally performing feature value decomposition on the covariance matrix C to obtain a feature value array T value and a feature vector matrix T vectors, and selecting the first k principal components, namely T vectors [: k ] according to the descending order of feature values;
S2.2, invoking an entropy weight method to calculate weights, and carrying out row summation on each feature to obtain a sum F sum of feature values;
Calculating the ratio of each feature: p ij=T[i,j]/Fsum [ i ], where T [ i, j ] represents a single feature and F sum [ i ] represents the sum of feature values.
Calculating entropy value of each feature: where i represents the ith feature, j represents the sample number, and m is the number of samples.
Calculating the weight of each feature: w i, wherein W i=Hi/sum (H), wherein W i represents the weight of the ith feature, H i represents the entropy of the ith feature, sum (H) represents the sum of the entropy of all features.
S2.3, calculating final weights by combining the principal components and the weights:
And calculating a final weight W f=Cvar [: k ]. W [: k ] as a weight matrix of the subsequent model according to the accumulated contribution rate C var [: k ] of the principal component and the characteristic weight W [: k ].
Further, the specific steps of S3 are as follows:
Constructing an improved radial basis function Bayesian network algorithm model, wherein the radial basis function Bayesian network consists of three layers: the input layer, the hidden layer and the output layer, assuming that the input sample is x and the corresponding target output is y, the specific steps of model construction are as follows:
s3.1, calculating an activation value of an hidden layer:
For the ith hidden layer neuron, the activation function uses a gaussian radial basis function: where μ i is the center point of the hidden layer neuron and σ is the width of the radial basis function;
S3.2, calculating the weight from the hidden layer to the output layer:
the weight w j of the j-th neuron of the output layer can be estimated by the least squares method: w j=∑(αi*yi*φi(x))/∑(αi*φi(x))2, where α i is the weighting factor of the hidden layer neuron and y i is the target output of the ith sample;
S3.3, predictive calculation of an output layer:
The predicted value Y pre of the output layer is obtained by multiplying the output of the hidden layer neuron with its corresponding weight and then summing: y pre=∑(wj*φj (x)) wherein j represents the index of the output layer neuron;
And S3.4, building the model by combining the calculation of the model network layer and the weight matrix obtained by the step S2.
Further, the specific steps of S4 are as follows:
S4.1, a fuzzy logic base scikit-fuzzy is called to create a fuzzy logic-based system, a fuzzy variable and a fuzzy set are defined by combining an internal environment data set of a storage tank, and safety standards of various factors including temperature, humidity, gas concentration and gas pressure are combined to divide the fuzzy variable and the fuzzy set into layers;
S4.2, establishing a membership function by combining the defined fuzzy set and the national issued storage tank internal environment data safety criterion;
and S4.3, defining a fuzzy control system and an evaluation simulator, and compiling a fuzzy logic rule by combining a membership function, so as to define the relation between an input fuzzy variable and an output fuzzy variable, thereby realizing the evaluation control system.
Further, the specific steps of training the model in S5 and predicting the data are as follows:
S5.1, carrying out dimension unified processing on the training set and the testing set, calling a model by using the training set to carry out model training, and taking the testing set as the input of the model to obtain predicted data;
And S5.2, combining the predicted data with the fuzzy control system and the evaluation simulator in the step S4, performing risk evaluation on the data, and storing and displaying the risk level through a file.
The beneficial effects are that:
Based on the existing chemical industry park environment data set, the invention utilizes the radial basis kernel Bayesian network for weight analysis by the PCA algorithm and the entropy weight method as a prediction model, and combines an evaluation system built by a fuzzy logic algorithm to perform safety evaluation on the chemical industry park environment. The method comprises the steps of cleaning and normalizing data, firstly carrying out weight analysis processing on the data by utilizing a PCA algorithm and an entropy weight method, then taking an obtained weight matrix as the weight analysis of a subsequent radial basis kernel Bayesian network algorithm, and then constructing a model by utilizing a Gaussian radial basis kernel function as an excitation function. The PCA algorithm and the entropy weight method are used for weight analysis, so that the correlation of data can be greatly improved, and the data dimension and attribute quantity can be reduced for the radial basis function Bayesian network model, so that the complexity of model training and calculation is reduced. By reducing the data dimension, the data quantity required to be processed in the training process can be reduced, and the training efficiency is improved. Meanwhile, the reduction of the attribute quantity is also beneficial to simplifying the calculation process of the model, accelerating the prediction speed and improving the parallel calculation capability of the model.
In the invention, a radial base core Bayesian network algorithm is used for building a model, a single-core Bayesian network is polynuclear, a plurality of processing cores are used for executing a plurality of calculation tasks simultaneously, and the workload is distributed to a plurality of cores or processors, so that the calculation efficiency and speed are improved. Therefore, the model can rapidly carry out reasoning and learning on a large amount of data, the parallel computing capacity of the model is improved, and compared with a Bayesian network, the model has nonlinear modeling capacity, and the accuracy of the model is greatly improved. Meanwhile, compared with other algorithms, the model uses single or multiple radial basis functions to express the model, and the basis functions generally have visual physical significance, so that the model has stronger interpretation, the inference process and the prediction result of the model can be better understood, and the data prediction is more accurate.
Drawings
FIG. 1 is a general flow chart of a chemical safety evaluation study based on a modified radial basis function Bayesian network;
FIG. 2 is a flow chart of a method of preprocessing a data set;
FIG. 3 is a flow chart of PCA and entropy weight analysis;
FIG. 4 is a radial basis function Bayesian network model construction flow chart;
FIG. 5 is a diagram of a risk assessment system.
Specific implementation steps
The application will be further illustrated with reference to specific examples based on environmental parameter data from a chemical industry park, it being understood that these examples are intended to illustrate the application and not to limit the scope of the application, and that modifications to the application in its various equivalents will fall within the scope of the application as defined in the appended claims after reading the application.
The invention discloses a chemical safety assessment method based on an improved radial basis function Bayesian network, which comprises the following specific examples:
S1: the chemical industry garden storage tank is subjected to environmental monitoring through the sensor, relevant environmental parameters such as temperature, humidity, gas concentration and air pressure inside the storage tank are obtained, the parameters are summarized to be used as a public data set, then the public data set is subjected to data cleaning, the data are subjected to normalization processing, a training set and a testing set are divided to be used as the input of a subsequent model, and the method is specifically shown in fig. 2:
S1.1: and acquiring the internal environment data of the storage tank in the chemical park, wherein the internal environment data comprise parameters such as the temperature, the humidity, the gas concentration, the gas pressure and the like of the storage tank, and storing the data in the chemical_data.csv.
S1.2: data is read from chemical_data.csv, parsed using '\t' as separator, and random seed setup is performed to ensure reproducibility of experiment.
S1.3: the dataset is partitioned into a training set and a testing set using an index of the dataset, and then data normalization processing is performed using MinMaxScaler classes in the sklearn library.
First, minMaxScaler objects are created.
Secondly, carrying out normalization processing on training data: the original training data is X train, the minimum value and the maximum value of the ith feature are selected to be X trainmin_i and X trainmax_i respectively, and for each sample j, the normalized value of the ith feature is X trainscaler_ij. The normalization formula is:
Xtrainscaled_ij=(Xtrain_ij-Xtrainmin_i)/(Xtrainmax_i-Xtrainmin_i)
Wherein i epsilon [1, n ], j epsilon [1, m ], n represents the number of features, and m represents the number of samples in the training data set.
Finally, carrying out normalization processing on the test data: the original test data is X test, the minimum value and the maximum value of the ith feature are selected to be X trainmin_i and X trainmax_i (the same as the training data), and for each sample j, the normalized value of the ith feature is X testscaler_ij. The normalization formula is:
Xtestscaler_ij=(Xtest_ij-Xtrainmin_i)/(Xtrainmax_i-Xtrainmin_i)
Wherein i epsilon [1, n ], j epsilon [1, m ], n represents the number of features, and m represents the number of samples in the test data set.
S1.4: a format conversion function is written that converts the training set and the testing set into a three-dimensional array format that is accepted by the model using the NumPy library.
S2: the PCA algorithm is combined with the entropy weight method, the weight analysis processing is carried out on the environmental data, the mutual influence among all environmental factors in the storage tank is combined, the feature classification is carried out, the weight matrix is obtained, and the weight matrix is used as a follow-up, and the method is specifically shown in fig. 3:
S2.1: performing Principal Component Analysis (PCA), performing centering treatment on the initial weight matrix T to obtain a centering feature matrix T center, and then calculating a feature covariance matrix:
C=(1/m)*Tcenter ^T*Tcenter
And T represents transposition operation, m is the number of samples, and finally, eigenvalue decomposition is carried out on the covariance matrix C to obtain an eigenvalue array T value and an eigenvector matrix T vectors. The first k principal components, T vectors [: k ], are selected according to a descending order of eigenvalues.
Finally, the cumulative contribution rate C var [: k ] of each principal component is calculated, wherein
Cvar_1=Tvalue[0]/sum(Tvalue)
Cvar_i=Cvar_{i-1}+Tvalue[i-1]/sum(Tvalue)
Wherein, C var_1 represents the proportion of the interpretation variance occupied by the first principal component, C var_i represents the proportion of the accumulated interpretation variances of the first i principal components, i increases from 2 to k, and the proportion of the accumulated interpretation variances of the first k principal components is the proportion of the accumulated interpretation variances of the first k principal components.
S2.2: and (4) invoking an entropy weight method to calculate weights, and summing each feature according to rows to obtain a sum F sum of feature values.
The sum of eigenvalues F sum:Fsum = x.sum (axis = 1), where if the shape of the eigenvalue matrix X is (n, m), then the shape of F sum is (n,), where each element F sum [ i ] represents the sum of the values of the ith feature over all samples.
Calculating the ratio of each feature: p ij=T[i,j]/Fsum [ i ], where T [ i, j ] represents a single feature and F sum [ i ] represents the sum of feature values.
Calculating entropy value of each feature: where i represents the ith feature, j represents the sample number, and m is the number of samples.
Calculating the weight of each feature: w i, wherein W i=Hi/sum (H), wherein W i represents the weight of the ith feature, H i represents the entropy of the ith feature, sum (H) represents the sum of the entropy of all features.
S2.3: the final weight is calculated by combining the principal components and the weights:
Based on the cumulative contribution of principal components C var [: k ] and the characteristic weights W [: k ], the final weights W f=Cvar [: k ] W [: k ] are calculated as a weight analysis matrix for the subsequent model.
The data set is reduced from 4 dimension to 2 dimension by a PCA algorithm in combination with the data set of the internal environment of the storage tank, so that a new data set X_transformed can be obtained, wherein each sample has only two characteristics.
Next, we calculate the weight coefficient for each feature using the entropy weight method. For the four characteristics of temperature, humidity, gas concentration and gas pressure, the occurrence probability distribution p of the four characteristics is calculated respectively, and the entropy values of the four characteristics are calculated according to the definition of information entropy. Finally, according to the ratio of the entropy value to the total entropy value, the weight coefficient W _i of each feature is calculated, the weight coefficients reflect the importance of each feature in the dataset, and the weight coefficients can be used for adjusting the influence degree of each feature in the dataset, so as to obtain the weight matrix of the dataset.
S3: constructing an improved radial basis function Bayesian network model, setting model parameters, replacing the weight of the model with a weight matrix after processing, performing data processing by taking a radial basis function as an activation function of an implicit layer, and transmitting a result to an output layer as the output of the model, wherein the specific example is shown in fig. 4:
s3.1: calculating an activation value of the hidden layer:
For the ith hidden layer neuron, the activation function uses a gaussian radial basis function (Radial Basis Function): where μ i is the center point of the hidden layer neuron and σ is the width (standard deviation) of the radial basis function.
S3.2: weight calculation from hidden layer to output layer:
The weight w j of the j-th neuron of the output layer can be estimated by the least squares method: w j=∑(αi*yi*φi(x))/∑(αi*φi(x))2, where α i is the weighting factor of the hidden layer neuron and y i is the target output of the ith sample.
S3.3: predictive calculation of the output layer:
the predicted value Y pre of the output layer can be obtained by multiplying the output of the hidden layer neuron with its corresponding weight and then summing: y pre=∑(wj*φj (x)) where j represents the index of the output layer neuron.
S3.4: combining the calculation of the model network layer and the weight matrix obtained by the step S2, and building the model, wherein:
Assuming that the original feature matrix is X e R n*m, the feature weight array is W e R m, where n represents the number of samples and m represents the number of features.
The new feature matrix obtained according to the sum of the PCA algorithm and the entropy weight method is:
Where w i represents the weight of the jth feature and x ij represents the jth feature value of the ith sample.
Set training data set asWherein X ε R n*m represents sample characteristics, y i∈{c1,c2,…,ck represents sample class, and the test dataset is/>
For training data set, feature normalization or normalization pretreatment operation can be performed to obtain new feature matrixThen, a radial basis kernel Bayes classifier is established by using a Gaussian naive Bayes algorithm, and the model is as follows:
wherein, gamma is a super parameter, Is a vector obtained by normalizing or normalizing the eigenvalues of the ith sample in the training dataset, I (y i=ck) is an indicator function, which is 1 when y i=ck, or is 0 otherwise.
And combining the steps, and establishing an improved radial basis kernel Bayesian network model.
S4: the fuzzy control system and the evaluation simulator are defined by combining a fuzzy logic algorithm, defining a fuzzy set and a membership function and combining the safety range of each feature of a data set, and the fuzzy control system and the evaluation simulator are used as rules for risk evaluation judgment, and specifically comprise the following steps:
S4.1, a fuzzy logic base scikit-fuzzy is called to create a fuzzy logic-based system, a fuzzy variable and a fuzzy set are defined by combining an internal environment data set of a storage tank, and safety standards of various factors including temperature, humidity, gas concentration and air pressure are combined to divide the fuzzy variable and the fuzzy set into layers, so that subsequent risk assessment is facilitated, wherein numerical ranges of various parameters are set, such as:
temperature: -40 ℃ to 60 ℃,
Humidity: 0% to 70%,
Gas concentration (ammonia concentration): 0ppm to 25ppm, wherein ppm means a concentration unit of one part per million (parts per million), which means that the concentration of a substance is 1ppm, which means that the proportion of the substance in the substance to be measured is one part per million,
Air pressure: 0.8mpa to 1.6mpa;
S4.2, establishing a membership function as a judgment of subsequent risk assessment by combining a defined fuzzy set and a national issued storage tank internal environment data safety criterion, wherein the safety ranges of all parameters are combined to perform different degrees of division, such as: low temperature, high temperature, low humidity, high humidity, low concentration, high concentration, low pressure, high pressure, wherein the dividing criteria are set according to specific parameters.
And S4.3, defining a fuzzy control system and an evaluation simulator, and compiling a fuzzy logic rule by combining a membership function, so as to define the relation between an input fuzzy variable and an output fuzzy variable, thereby realizing the evaluation control system. Wherein the system provides a set of fuzzy rules for connecting the input and output variables, the rules being used to determine the fuzzy value of the output from the fuzzy value of the input.
Wherein for the determination of the rules, the degree of influence of each factor on the safety of the storage tank is set, for example: the change of temperature may cause the change of air pressure, and the change of air concentration may also cause the change of air pressure, and further cause the explosion of the storage tank, so that the influence degree of temperature, air concentration and air pressure is the highest, and the regular arrangement needs to be combined with these to perform comprehensive division, for example: the safety risk of the storage tank can be improved when the temperature is too high or too low, and the gas concentration is the same as the gas pressure. But also needs to be judged together by combining all the factors to carry out comprehensive evaluation.
And the risk level is divided into a normal 0 situation and a risk 1 situation, and the risk level obtained by the data evaluation result is 1, so that the data at the moment is more risk, whether the storage tank is at risk can be judged in advance, and the chemical safety is greatly improved.
Each rule consists of a condition part and a conclusion part. The conditional portion combines the input variables using logical operators (e.g., "or" and "), while the conclusion portion represents the fuzzy value of the output variable. When given a specific value of a set of input variables, the degree of contribution to the final output result can be determined by calculating the degree of activation (activation degree) of each rule. Then, the fuzzy value of the output variable is calculated by carrying out aggregation and de-blurring operation on all the activated rules. Thus, we can describe the behavior of the fuzzy control system using its rules and functions. And then safety evaluation is performed.
S5: calling an improved radial basis function Bayesian network model, training the model and predicting data, performing risk assessment on predicted data, judging the obtained risk level, and further judging the safety of the environment in advance:
S5.1: performing dimension unified processing on the training set and the testing set, calling a model by using the training set to perform model training, and taking the testing set as the input of the model to obtain predicted data;
S5.2: and combining the predicted data with the fuzzy control system and the evaluation simulator in the step S4, performing risk evaluation on the data, and storing and displaying the data through a file.
And (3) reading data in the environment data set by combining the step (S3) and the step (S4), calling an improved radial basis kernel Bayesian network model, firstly reducing the dimension of parameters of four categories in the data set by combining a PCA algorithm, reducing the dimension from four dimensions to two dimensions, then combining the influence degree of each parameter on the safety of the storage tank, carrying out weight analysis on the data by utilizing an entropy weight method, then calling the model to predict the data, carrying out risk assessment on the predicted data by combining an assessment rule, and obtaining risk grades corresponding to the predicted data and displaying the risk grades through a line graph, wherein the risk grades are shown in a specific graph shown in fig. 5. And carry out the safety judgement, when the risk assessment level reached 1, carry out the warning in advance through popping out bullet window warning and warning, can improve the security in chemical industry garden greatly.
The invention creatively provides a chemical safety assessment method based on an improved radial basis kernel Bayesian network, which is characterized in that a radial basis kernel Bayesian network is improved by combining a PCA algorithm and an entropy weight method, a model is established to predict data, a fuzzy logic algorithm is utilized to set a risk assessment judgment rule, the predicted data is combined with the judgment rule to carry out risk assessment, and the safety assessment is carried out on the assessed result, when the risk assessment grade is equal to grade 1 (wherein the safety grade is 0-1), warning and alarm prompt are carried out through a pop-up bullet frame, and as the data is predicted according to historical data, the potential problem of the data can be known, and the safety of a chemical industry park is greatly improved.
The chemical safety assessment method based on the improved radial basis function Bayesian network provided by the invention can be used in the chemical safety field and can also be applied to other data prediction fields.
The above description is only an example of the present invention and is not intended to limit the present invention. All equivalents and alternatives falling within the spirit of the invention are intended to be included within the scope of the invention. What is not elaborated on the invention belongs to the prior art which is known to the person skilled in the art.
Claims (6)
1. The chemical safety assessment method based on the improved radial basis function Bayesian network is characterized by comprising the following steps of:
S1, carrying out environmental monitoring on a chemical industry park storage tank through a sensor, acquiring temperature, humidity, gas concentration and air pressure related environmental parameters in the storage tank, summarizing the environmental parameters to be used as a public data set, then carrying out data cleaning on the public data set, carrying out data normalization processing, and dividing a training set and a testing set to be used as input of a subsequent model;
s2, combining a PCA algorithm with an entropy weight method, carrying out weight analysis processing on environmental data, and carrying out feature classification by combining the mutual influence among all environmental factors in the storage tank so as to obtain a weight matrix, and using the weight matrix as a subsequent use;
S3, constructing a radial basis function Bayesian network model, setting model parameters, replacing the weight of the model with a weight matrix after processing, performing data processing by taking a radial basis function as an activation function of an implicit layer, and transmitting a result to an output layer to serve as the output of the model;
S4, defining a fuzzy set and a membership function by combining a fuzzy logic algorithm, and defining a fuzzy control system and an evaluation simulator by combining the safety range of each feature of the data set to be used as a rule for risk evaluation judgment;
S5, calling the improved radial basis function Bayesian network model in S3, training the model and predicting data, performing risk assessment on the predicted data through an assessment system to obtain risk levels of the predicted data, judging the risk levels, and further judging the safety of the environment in advance.
2. The chemical safety assessment method based on the improved radial basis function Bayesian network of claim 1, wherein the specific steps of S1 are as follows:
S1.1, acquiring internal environment data of a storage tank in a chemical park, and storing the data in chemical_data.csv, wherein a data set comprises temperature, humidity, gas concentration and gas pressure parameters;
S1.2, reading data from chemical_data.csv, analyzing the data by using '\t' as a separator, and setting random seeds to ensure the reproducibility of experiments;
S1.3, dividing the data set into a training set and a testing set by utilizing the index of the data set, and then carrying out data normalization processing by using MinMaxScaler types in a sklearn library;
s1.4, writing a format conversion function, wherein the function uses NumPy library to convert the training set and the testing set into a three-dimensional array format accepted by the model.
3. The chemical safety assessment method based on the improved radial basis function Bayesian network of claim 1, wherein the combination of the PCA algorithm and the entropy weight method in the step S2 comprises the following specific steps:
S2.1, performing Principal Component Analysis (PCA), performing centering treatment on an initial weight matrix T to obtain a centering feature matrix T center, then calculating a feature covariance matrix C= (1/m) T center^T*Tcenter, wherein T represents transposition operation, m is the number of samples, and finally performing feature value decomposition on the covariance matrix C to obtain a feature value array T value and a feature vector matrix T vectors, and selecting the first k principal components, namely T vectors [: k ] according to the descending order of feature values;
S2.2, invoking an entropy weight method to calculate weights, and carrying out row summation on each feature to obtain a sum F sum of feature values;
Calculating the ratio of each feature: p ij=T[i,j]/Fsum [ i ], wherein T [ i, j ] represents a single feature and F sum [ i ] represents the sum of feature values;
Calculating entropy value of each feature: wherein i represents the ith feature, j represents the sample number, and m is the number of samples;
Calculating the weight of each feature: w i=Hi/sum (H), where W i represents the weight of the ith feature, H i represents the entropy of the ith feature, sum (H) represents the sum of the entropy of all features;
S2.3, calculating final weights by combining the principal components and the weights:
And calculating a final weight W f=Cvar [: k ]. W [: k ] as a weight matrix of the subsequent model according to the accumulated contribution rate C var [: k ] of the principal component and the characteristic weight W [: k ].
4. A chemical safety assessment method based on an improved radial basis function bayesian network according to claim 3, wherein the specific steps of S3 are as follows:
Constructing an improved radial basis function Bayesian network algorithm model, wherein the radial basis function Bayesian network consists of three layers: the input layer, the hidden layer and the output layer, assuming that the input sample is x and the corresponding target output is y, the specific steps of model construction are as follows:
s3.1, calculating an activation value of an hidden layer:
For the ith hidden layer neuron, the activation function uses a gaussian radial basis function: where μ i is the center point of the hidden layer neuron and σ is the width of the radial basis function;
S3.2, calculating the weight from the hidden layer to the output layer:
the weight w j of the j-th neuron of the output layer is estimated by the least squares method: w j=∑(αi*yi*φi(x))/∑(αi*φi(x))2, where α i is the weighting factor of the hidden layer neuron and y i is the target output of the ith sample;
S3.3, predictive calculation of an output layer:
The predicted value Y pre of the output layer is obtained by multiplying the output of the hidden layer neuron with its corresponding weight and then summing: y pre=∑(wj*φj (x)) wherein j represents the index of the output layer neuron;
And S3.4, building the model by combining the calculation of the model network layer and the weight matrix obtained by the step S2.
5. The chemical safety assessment method based on the improved radial basis function Bayesian network of claim 1, wherein the specific steps of S4 are as follows:
S4.1, a fuzzy logic base scikit-fuzzy is called to create a fuzzy logic-based system, a fuzzy variable and a fuzzy set are defined by combining an internal environment data set of a storage tank, and safety standards of various factors including temperature, humidity, gas concentration and gas pressure are combined to divide the fuzzy variable and the fuzzy set into layers;
S4.2, establishing a membership function by combining the defined fuzzy set and the national issued storage tank internal environment data safety criterion;
and S4.3, defining a fuzzy control system and an evaluation simulator, and compiling a fuzzy logic rule by combining a membership function, so as to define the relation between an input fuzzy variable and an output fuzzy variable, thereby realizing the evaluation control system.
6. The chemical safety assessment method based on the improved radial basis function bayesian network according to claim 1, wherein the specific steps of training the model and predicting the data in S5 are as follows:
S5.1, carrying out dimension unified processing on the training set and the testing set, calling a model by using the training set to carry out model training, and taking the testing set as the input of the model to obtain predicted data;
And S5.2, combining the predicted data with the fuzzy control system and the evaluation simulator in the step S4, performing risk evaluation on the data, and storing and displaying the risk level through a file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410091657.7A CN117973852A (en) | 2024-01-22 | 2024-01-22 | Chemical safety assessment method based on improved radial basis function Bayesian network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410091657.7A CN117973852A (en) | 2024-01-22 | 2024-01-22 | Chemical safety assessment method based on improved radial basis function Bayesian network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117973852A true CN117973852A (en) | 2024-05-03 |
Family
ID=90860888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410091657.7A Pending CN117973852A (en) | 2024-01-22 | 2024-01-22 | Chemical safety assessment method based on improved radial basis function Bayesian network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117973852A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118365148A (en) * | 2024-06-19 | 2024-07-19 | 北京思路智园科技有限公司 | Chemical industry park dynamic risk monitoring and early warning method and system based on artificial intelligence |
-
2024
- 2024-01-22 CN CN202410091657.7A patent/CN117973852A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118365148A (en) * | 2024-06-19 | 2024-07-19 | 北京思路智园科技有限公司 | Chemical industry park dynamic risk monitoring and early warning method and system based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yun | Prediction model of algal blooms using logistic regression and confusion matrix | |
Shao et al. | Nickel price forecast based on the LSTM neural network optimized by the improved PSO algorithm | |
Virág et al. | Is there a trade-off between the predictive power and the interpretability of bankruptcy models? The case of the first Hungarian bankruptcy prediction model | |
CN112989621B (en) | Model performance evaluation method, device, equipment and storage medium | |
CN111784061B (en) | Training method, device and equipment for power grid engineering cost prediction model | |
CN112257935B (en) | Aviation safety prediction method based on LSTM-RBF neural network model | |
CN117593142A (en) | Financial risk assessment management method and system | |
CN115115416B (en) | Commodity sales predicting method | |
CN117390635A (en) | Safety monitoring method and system based on big data analysis | |
CN117973852A (en) | Chemical safety assessment method based on improved radial basis function Bayesian network | |
CN114565021A (en) | Financial asset pricing method, system and storage medium based on quantum circulation neural network | |
Wang et al. | Credit scoring based on the set-valued identification method | |
CN115841278B (en) | Method, system, equipment and medium for evaluating running error state of electric energy metering device | |
Mukherjee et al. | Detection of defaulters in P2P lending platforms using unsupervised learning | |
Qiao et al. | Enterprise financial risk early warning method based on hybrid PSO-SVM model | |
Sun et al. | Short-Term Exhaust Gas Temperature Trend Prediction of a Marine Diesel Engine Based on an Improved Slime Mold Algorithm-Optimized Bidirectional Long Short-Term Memory—Temporal Pattern Attention Ensemble Model | |
CN118310746A (en) | Unsupervised generator bearing fault detection method based on variation self-encoder | |
CN117522066A (en) | Combined optimization method and system based on peak shaving power supply equipment combination prediction | |
Nureni et al. | Loan approval prediction based on machine learning approach | |
CN117113086A (en) | Energy storage unit load prediction method, system, electronic equipment and medium | |
Gan | Discrete Hopfield neural network approach for crane safety evaluation | |
Janková et al. | Utilization of artificial intelligence for sensitivity analysis in the stock market. | |
Gao | Financial Loan Default Risk Prediction Based on Big Data Analysis | |
Donets et al. | Methodology of the countries’ economic development data analysis | |
Zhou et al. | Forecasting agricultural product logistics demand by nonlinear principal component analysis and a support vector machine optimised by the grey wolf optimiser |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |