CN117973852A

CN117973852A - Chemical safety assessment method based on improved radial basis function Bayesian network

Info

Publication number: CN117973852A
Application number: CN202410091657.7A
Authority: CN
Inventors: 胡勇; 金鹰; 康瑶; 徐亿如; 张水星
Original assignee: Huaiyin Institute of Technology
Current assignee: Huaiyin Institute of Technology
Priority date: 2024-01-22
Filing date: 2024-01-22
Publication date: 2024-05-03

Abstract

The invention discloses a chemical safety assessment method based on an improved radial basis function Bayesian network, which comprises the steps of obtaining a public data set, processing and dividing a training set and a testing set; combining the PCA algorithm with an entropy weight method, carrying out weight analysis processing on the data, and obtaining a weight matrix; constructing a Bayesian network model based on a radial basis function, replacing the weight of the model with a weight matrix after processing, and constructing the model by taking the radial basis function as an activation function of an implicit layer; defining a fuzzy set and a membership function by combining a fuzzy logic algorithm, defining a fuzzy control system and an evaluation simulator according to a data set, and judging risk evaluation; and constructing an improved radial basis function Bayesian network model, training the model and predicting data, and performing risk assessment on the predicted data. The method improves the correlation between data, effectively monitors potential dangerous factors, and greatly improves the safety of a chemical industry park.

Description

Chemical safety assessment method based on improved radial basis function Bayesian network

Technical field:

The invention belongs to the field of chemical safety evaluation, and particularly relates to a chemical safety evaluation method based on an improved radial basis kernel Bayesian network.

Background

Chemical safety assessment systems are a central industrial safety tool, the main purpose of which is to assist chemical enterprises in assessing and managing their potential safety risks. The safety problem is particularly important because the chemical industry involves complex chemical reactions and large amounts of hazardous substances. Previously, many security assessment systems have relied on a large number of manual inputs and operations. These systems require manual input of equipment parameters, operating conditions, etc. by specialized chemical engineers and complex calculations to arrive at risk assessment results. This way of relying on manual operations is not only inefficient, but may lead to inaccuracy in the evaluation result due to human error. In recent years, with the rise of artificial intelligence, deep learning and machine learning are widely used in the field of security assessment, and are continuously perfected.

For the field of chemical safety evaluation, bayesian networks and graph theory methods are widely used, and the algorithm is often applied to the fields of prediction and classification due to high algorithm interpretation and high efficiency. For the field of chemical safety evaluation, correlation processing is required according to environmental parameters of a chemical industry park, and although a Bayesian network is often applied to the field, the accuracy of a model is not high due to high data requirements and high calculation complexity, so that a multi-core Bayesian network method can be adopted to overcome the problems. The particular feature in which a single core is replaced by a multi-core is the introduction of parallel computing. The multi-core Bayesian network can simultaneously execute a plurality of calculation tasks and distribute the workload to a plurality of cores or processors, thereby improving the calculation efficiency and speed and further improving the accuracy of the model. However, the adoption of the multi-core bayesian network also faces some technical problems, such as the problem that a plurality of cores perform parallel computation, so that the complexity of a model is increased. For the problems, the PCA algorithm and the entropy weight method are adopted to reduce the dimension of data, and the calculation complexity is reduced. The PCA can convert data into fewer main components, so that the data processing and calculation are more efficient, the parallel calculation capacity of the model can be greatly improved, and the accuracy of the model is further improved.

The invention aims to: aiming at the problems pointed out in the background art, the invention provides a chemical safety assessment method based on an improved radial basis kernel Bayesian network, which aims to solve the problems of low accuracy and instability of the existing chemical risk assessment system.

The technical scheme is as follows: the invention provides a chemical safety assessment method based on an improved radial basis function Bayesian network, which comprises the following steps:

S1, carrying out environmental monitoring on a chemical industry park storage tank through a sensor, acquiring temperature, humidity, gas concentration and air pressure related environmental parameters in the storage tank, summarizing the environmental parameters to be used as a public data set, then carrying out data cleaning on the public data set, carrying out data normalization processing, and dividing a training set and a testing set to be used as input of a subsequent model;

s2, combining a PCA algorithm with an entropy weight method, carrying out weight analysis processing on environmental data, and carrying out feature classification by combining the mutual influence among all environmental factors in the storage tank so as to obtain a weight matrix, and using the weight matrix as a subsequent use;

S3, constructing a radial basis function Bayesian network model, setting model parameters, replacing the weight of the model with a weight matrix after processing, performing data processing by taking a radial basis function as an activation function of an implicit layer, and transmitting a result to an output layer to serve as the output of the model;

S4, defining a fuzzy set and a membership function by combining a fuzzy logic algorithm, and defining a fuzzy control system and an evaluation simulator by combining the safety range of each feature of the data set to be used as a rule for risk evaluation judgment;

S5, calling the improved radial basis function Bayesian network model in S3, training the model and predicting data, performing risk assessment on the predicted data through an assessment system to obtain risk levels of the predicted data, judging the risk levels, and further judging the safety of the environment in advance.

Further, the specific steps of S1 are as follows:

S1.1, acquiring internal environment data of a storage tank in a chemical park, and storing the data in chemical_data.csv, wherein a data set comprises temperature, humidity, gas concentration and gas pressure parameters;

S1.2, reading data from chemical_data.csv, analyzing the data by using '\t' as a separator, and setting random seeds to ensure the reproducibility of experiments;

S1.3, dividing the data set into a training set and a testing set by utilizing the index of the data set, and then carrying out data normalization processing by using MinMaxScaler types in a sklearn library;

s1.4, writing a format conversion function, wherein the function uses NumPy library to convert the training set and the testing set into a three-dimensional array format accepted by the model.

Further, in the step S2, the PCA algorithm is combined with the entropy weighting method, and the specific steps are as follows:

S2.1, performing Principal Component Analysis (PCA), performing centering treatment on an initial weight matrix T to obtain a centering feature matrix T _center, then calculating a feature covariance matrix C= (1/m) T _center ^{^T}*T_center, wherein T represents transposition operation, m is the number of samples, and finally performing feature value decomposition on the covariance matrix C to obtain a feature value array T _value and a feature vector matrix T _vectors, and selecting the first k principal components, namely T _vectors [: k ] according to the descending order of feature values;

S2.2, invoking an entropy weight method to calculate weights, and carrying out row summation on each feature to obtain a sum F _sum of feature values;

Calculating the ratio of each feature: p _ij＝T[i,j]/F_sum [ i ], where T [ i, j ] represents a single feature and F _sum [ i ] represents the sum of feature values.

Calculating entropy value of each feature: where i represents the ith feature, j represents the sample number, and m is the number of samples.

Calculating the weight of each feature: w _i, wherein W _i＝H_i/sum (H), wherein W _i represents the weight of the ith feature, H _i represents the entropy of the ith feature, sum (H) represents the sum of the entropy of all features.

S2.3, calculating final weights by combining the principal components and the weights:

And calculating a final weight W _f＝C_var [: k ]. W [: k ] as a weight matrix of the subsequent model according to the accumulated contribution rate C _var [: k ] of the principal component and the characteristic weight W [: k ].

Further, the specific steps of S3 are as follows:

Constructing an improved radial basis function Bayesian network algorithm model, wherein the radial basis function Bayesian network consists of three layers: the input layer, the hidden layer and the output layer, assuming that the input sample is x and the corresponding target output is y, the specific steps of model construction are as follows:

s3.1, calculating an activation value of an hidden layer:

For the ith hidden layer neuron, the activation function uses a gaussian radial basis function: where μ _i is the center point of the hidden layer neuron and σ is the width of the radial basis function;

S3.2, calculating the weight from the hidden layer to the output layer:

the weight w _j of the j-th neuron of the output layer can be estimated by the least squares method: w _j＝∑(α_i*y_i*φ_i(x))/∑(α_i*φ_i(x))², where α _i is the weighting factor of the hidden layer neuron and y _i is the target output of the ith sample;

S3.3, predictive calculation of an output layer:

The predicted value Y _pre of the output layer is obtained by multiplying the output of the hidden layer neuron with its corresponding weight and then summing: y _pre＝∑(w_j*φ_j (x)) wherein j represents the index of the output layer neuron;

And S3.4, building the model by combining the calculation of the model network layer and the weight matrix obtained by the step S2.

Further, the specific steps of S4 are as follows:

S4.1, a fuzzy logic base scikit-fuzzy is called to create a fuzzy logic-based system, a fuzzy variable and a fuzzy set are defined by combining an internal environment data set of a storage tank, and safety standards of various factors including temperature, humidity, gas concentration and gas pressure are combined to divide the fuzzy variable and the fuzzy set into layers;

S4.2, establishing a membership function by combining the defined fuzzy set and the national issued storage tank internal environment data safety criterion;

and S4.3, defining a fuzzy control system and an evaluation simulator, and compiling a fuzzy logic rule by combining a membership function, so as to define the relation between an input fuzzy variable and an output fuzzy variable, thereby realizing the evaluation control system.

Further, the specific steps of training the model in S5 and predicting the data are as follows:

S5.1, carrying out dimension unified processing on the training set and the testing set, calling a model by using the training set to carry out model training, and taking the testing set as the input of the model to obtain predicted data;

And S5.2, combining the predicted data with the fuzzy control system and the evaluation simulator in the step S4, performing risk evaluation on the data, and storing and displaying the risk level through a file.

The beneficial effects are that:

Based on the existing chemical industry park environment data set, the invention utilizes the radial basis kernel Bayesian network for weight analysis by the PCA algorithm and the entropy weight method as a prediction model, and combines an evaluation system built by a fuzzy logic algorithm to perform safety evaluation on the chemical industry park environment. The method comprises the steps of cleaning and normalizing data, firstly carrying out weight analysis processing on the data by utilizing a PCA algorithm and an entropy weight method, then taking an obtained weight matrix as the weight analysis of a subsequent radial basis kernel Bayesian network algorithm, and then constructing a model by utilizing a Gaussian radial basis kernel function as an excitation function. The PCA algorithm and the entropy weight method are used for weight analysis, so that the correlation of data can be greatly improved, and the data dimension and attribute quantity can be reduced for the radial basis function Bayesian network model, so that the complexity of model training and calculation is reduced. By reducing the data dimension, the data quantity required to be processed in the training process can be reduced, and the training efficiency is improved. Meanwhile, the reduction of the attribute quantity is also beneficial to simplifying the calculation process of the model, accelerating the prediction speed and improving the parallel calculation capability of the model.

In the invention, a radial base core Bayesian network algorithm is used for building a model, a single-core Bayesian network is polynuclear, a plurality of processing cores are used for executing a plurality of calculation tasks simultaneously, and the workload is distributed to a plurality of cores or processors, so that the calculation efficiency and speed are improved. Therefore, the model can rapidly carry out reasoning and learning on a large amount of data, the parallel computing capacity of the model is improved, and compared with a Bayesian network, the model has nonlinear modeling capacity, and the accuracy of the model is greatly improved. Meanwhile, compared with other algorithms, the model uses single or multiple radial basis functions to express the model, and the basis functions generally have visual physical significance, so that the model has stronger interpretation, the inference process and the prediction result of the model can be better understood, and the data prediction is more accurate.

Drawings

FIG. 1 is a general flow chart of a chemical safety evaluation study based on a modified radial basis function Bayesian network;

FIG. 2 is a flow chart of a method of preprocessing a data set;

FIG. 3 is a flow chart of PCA and entropy weight analysis;

FIG. 4 is a radial basis function Bayesian network model construction flow chart;

FIG. 5 is a diagram of a risk assessment system.

Specific implementation steps

The application will be further illustrated with reference to specific examples based on environmental parameter data from a chemical industry park, it being understood that these examples are intended to illustrate the application and not to limit the scope of the application, and that modifications to the application in its various equivalents will fall within the scope of the application as defined in the appended claims after reading the application.

The invention discloses a chemical safety assessment method based on an improved radial basis function Bayesian network, which comprises the following specific examples:

S1: the chemical industry garden storage tank is subjected to environmental monitoring through the sensor, relevant environmental parameters such as temperature, humidity, gas concentration and air pressure inside the storage tank are obtained, the parameters are summarized to be used as a public data set, then the public data set is subjected to data cleaning, the data are subjected to normalization processing, a training set and a testing set are divided to be used as the input of a subsequent model, and the method is specifically shown in fig. 2:

S1.1: and acquiring the internal environment data of the storage tank in the chemical park, wherein the internal environment data comprise parameters such as the temperature, the humidity, the gas concentration, the gas pressure and the like of the storage tank, and storing the data in the chemical_data.csv.

S1.2: data is read from chemical_data.csv, parsed using '\t' as separator, and random seed setup is performed to ensure reproducibility of experiment.

S1.3: the dataset is partitioned into a training set and a testing set using an index of the dataset, and then data normalization processing is performed using MinMaxScaler classes in the sklearn library.

First, minMaxScaler objects are created.

Secondly, carrying out normalization processing on training data: the original training data is X _train, the minimum value and the maximum value of the ith feature are selected to be X _{trainmin_i} and X _{trainmax_i} respectively, and for each sample j, the normalized value of the ith feature is X _{trainscaler_ij}. The normalization formula is:

X_{trainscaled_ij}＝(X_{train_ij}-X_{trainmin_i})/(X_{trainmax_i}-X_{trainmin_i})

Wherein i epsilon [1, n ], j epsilon [1, m ], n represents the number of features, and m represents the number of samples in the training data set.

Finally, carrying out normalization processing on the test data: the original test data is X _test, the minimum value and the maximum value of the ith feature are selected to be X _{trainmin_i} and X _{trainmax_i} (the same as the training data), and for each sample j, the normalized value of the ith feature is X _{testscaler_ij}. The normalization formula is:

X_{testscaler_ij}＝(X_{test_ij}-X_{trainmin_i})/(X_{trainmax_i}-X_{trainmin_i})

Wherein i epsilon [1, n ], j epsilon [1, m ], n represents the number of features, and m represents the number of samples in the test data set.

S1.4: a format conversion function is written that converts the training set and the testing set into a three-dimensional array format that is accepted by the model using the NumPy library.

S2: the PCA algorithm is combined with the entropy weight method, the weight analysis processing is carried out on the environmental data, the mutual influence among all environmental factors in the storage tank is combined, the feature classification is carried out, the weight matrix is obtained, and the weight matrix is used as a follow-up, and the method is specifically shown in fig. 3:

S2.1: performing Principal Component Analysis (PCA), performing centering treatment on the initial weight matrix T to obtain a centering feature matrix T _center, and then calculating a feature covariance matrix:

C＝(1/m)*T_center ^{^T}*T_center

And T represents transposition operation, m is the number of samples, and finally, eigenvalue decomposition is carried out on the covariance matrix C to obtain an eigenvalue array T _value and an eigenvector matrix T _vectors. The first k principal components, T _vectors [: k ], are selected according to a descending order of eigenvalues.

Finally, the cumulative contribution rate C _var [: k ] of each principal component is calculated, wherein

C_{var_1}＝T_value[0]/sum(T_value)

C_{var_i}＝C_{var_{i-1}}+T_value[i-1]/sum(T_value)

Wherein, C _{var_1} represents the proportion of the interpretation variance occupied by the first principal component, C _{var_i} represents the proportion of the accumulated interpretation variances of the first i principal components, i increases from 2 to k, and the proportion of the accumulated interpretation variances of the first k principal components is the proportion of the accumulated interpretation variances of the first k principal components.

S2.2: and (4) invoking an entropy weight method to calculate weights, and summing each feature according to rows to obtain a sum F _sum of feature values.

The sum of eigenvalues F _sum：F_sum = x.sum (axis = 1), where if the shape of the eigenvalue matrix X is (n, m), then the shape of F _sum is (n,), where each element F _sum [ i ] represents the sum of the values of the ith feature over all samples.

S2.3: the final weight is calculated by combining the principal components and the weights:

Based on the cumulative contribution of principal components C _var [: k ] and the characteristic weights W [: k ], the final weights W _f＝C_var [: k ] W [: k ] are calculated as a weight analysis matrix for the subsequent model.

The data set is reduced from 4 dimension to 2 dimension by a PCA algorithm in combination with the data set of the internal environment of the storage tank, so that a new data set X_transformed can be obtained, wherein each sample has only two characteristics.

Next, we calculate the weight coefficient for each feature using the entropy weight method. For the four characteristics of temperature, humidity, gas concentration and gas pressure, the occurrence probability distribution p of the four characteristics is calculated respectively, and the entropy values of the four characteristics are calculated according to the definition of information entropy. Finally, according to the ratio of the entropy value to the total entropy value, the weight coefficient W _{_i} of each feature is calculated, the weight coefficients reflect the importance of each feature in the dataset, and the weight coefficients can be used for adjusting the influence degree of each feature in the dataset, so as to obtain the weight matrix of the dataset.

S3: constructing an improved radial basis function Bayesian network model, setting model parameters, replacing the weight of the model with a weight matrix after processing, performing data processing by taking a radial basis function as an activation function of an implicit layer, and transmitting a result to an output layer as the output of the model, wherein the specific example is shown in fig. 4:

s3.1: calculating an activation value of the hidden layer:

For the ith hidden layer neuron, the activation function uses a gaussian radial basis function (Radial Basis Function): where μ _i is the center point of the hidden layer neuron and σ is the width (standard deviation) of the radial basis function.

S3.2: weight calculation from hidden layer to output layer:

The weight w _j of the j-th neuron of the output layer can be estimated by the least squares method: w _j＝∑(α_i*y_i*φ_i(x))/∑(α_i*φ_i(x))², where α _i is the weighting factor of the hidden layer neuron and y _i is the target output of the ith sample.

S3.3: predictive calculation of the output layer:

the predicted value Y _pre of the output layer can be obtained by multiplying the output of the hidden layer neuron with its corresponding weight and then summing: y _pre＝∑(w_j*φ_j (x)) where j represents the index of the output layer neuron.

S3.4: combining the calculation of the model network layer and the weight matrix obtained by the step S2, and building the model, wherein:

Assuming that the original feature matrix is X e R ^n*m, the feature weight array is W e R ^m, where n represents the number of samples and m represents the number of features.

The new feature matrix obtained according to the sum of the PCA algorithm and the entropy weight method is:

Where w _i represents the weight of the jth feature and x _ij represents the jth feature value of the ith sample.

Set training data set asWherein X ε R ^n*m represents sample characteristics, y _i∈{c₁,c₂,…,c_k represents sample class, and the test dataset is/>

For training data set, feature normalization or normalization pretreatment operation can be performed to obtain new feature matrixThen, a radial basis kernel Bayes classifier is established by using a Gaussian naive Bayes algorithm, and the model is as follows:

wherein, gamma is a super parameter, Is a vector obtained by normalizing or normalizing the eigenvalues of the ith sample in the training dataset, I (y _i＝c_k) is an indicator function, which is 1 when y _i＝c_k, or is 0 otherwise.

And combining the steps, and establishing an improved radial basis kernel Bayesian network model.

S4: the fuzzy control system and the evaluation simulator are defined by combining a fuzzy logic algorithm, defining a fuzzy set and a membership function and combining the safety range of each feature of a data set, and the fuzzy control system and the evaluation simulator are used as rules for risk evaluation judgment, and specifically comprise the following steps:

S4.1, a fuzzy logic base scikit-fuzzy is called to create a fuzzy logic-based system, a fuzzy variable and a fuzzy set are defined by combining an internal environment data set of a storage tank, and safety standards of various factors including temperature, humidity, gas concentration and air pressure are combined to divide the fuzzy variable and the fuzzy set into layers, so that subsequent risk assessment is facilitated, wherein numerical ranges of various parameters are set, such as:

temperature: -40 ℃ to 60 ℃,

Humidity: 0% to 70%,

Gas concentration (ammonia concentration): 0ppm to 25ppm, wherein ppm means a concentration unit of one part per million (parts per million), which means that the concentration of a substance is 1ppm, which means that the proportion of the substance in the substance to be measured is one part per million,

Air pressure: 0.8mpa to 1.6mpa;

S4.2, establishing a membership function as a judgment of subsequent risk assessment by combining a defined fuzzy set and a national issued storage tank internal environment data safety criterion, wherein the safety ranges of all parameters are combined to perform different degrees of division, such as: low temperature, high temperature, low humidity, high humidity, low concentration, high concentration, low pressure, high pressure, wherein the dividing criteria are set according to specific parameters.

And S4.3, defining a fuzzy control system and an evaluation simulator, and compiling a fuzzy logic rule by combining a membership function, so as to define the relation between an input fuzzy variable and an output fuzzy variable, thereby realizing the evaluation control system. Wherein the system provides a set of fuzzy rules for connecting the input and output variables, the rules being used to determine the fuzzy value of the output from the fuzzy value of the input.

Wherein for the determination of the rules, the degree of influence of each factor on the safety of the storage tank is set, for example: the change of temperature may cause the change of air pressure, and the change of air concentration may also cause the change of air pressure, and further cause the explosion of the storage tank, so that the influence degree of temperature, air concentration and air pressure is the highest, and the regular arrangement needs to be combined with these to perform comprehensive division, for example: the safety risk of the storage tank can be improved when the temperature is too high or too low, and the gas concentration is the same as the gas pressure. But also needs to be judged together by combining all the factors to carry out comprehensive evaluation.

And the risk level is divided into a normal 0 situation and a risk 1 situation, and the risk level obtained by the data evaluation result is 1, so that the data at the moment is more risk, whether the storage tank is at risk can be judged in advance, and the chemical safety is greatly improved.

Each rule consists of a condition part and a conclusion part. The conditional portion combines the input variables using logical operators (e.g., "or" and "), while the conclusion portion represents the fuzzy value of the output variable. When given a specific value of a set of input variables, the degree of contribution to the final output result can be determined by calculating the degree of activation (activation degree) of each rule. Then, the fuzzy value of the output variable is calculated by carrying out aggregation and de-blurring operation on all the activated rules. Thus, we can describe the behavior of the fuzzy control system using its rules and functions. And then safety evaluation is performed.

S5: calling an improved radial basis function Bayesian network model, training the model and predicting data, performing risk assessment on predicted data, judging the obtained risk level, and further judging the safety of the environment in advance:

S5.1: performing dimension unified processing on the training set and the testing set, calling a model by using the training set to perform model training, and taking the testing set as the input of the model to obtain predicted data;

S5.2: and combining the predicted data with the fuzzy control system and the evaluation simulator in the step S4, performing risk evaluation on the data, and storing and displaying the data through a file.

And (3) reading data in the environment data set by combining the step (S3) and the step (S4), calling an improved radial basis kernel Bayesian network model, firstly reducing the dimension of parameters of four categories in the data set by combining a PCA algorithm, reducing the dimension from four dimensions to two dimensions, then combining the influence degree of each parameter on the safety of the storage tank, carrying out weight analysis on the data by utilizing an entropy weight method, then calling the model to predict the data, carrying out risk assessment on the predicted data by combining an assessment rule, and obtaining risk grades corresponding to the predicted data and displaying the risk grades through a line graph, wherein the risk grades are shown in a specific graph shown in fig. 5. And carry out the safety judgement, when the risk assessment level reached 1, carry out the warning in advance through popping out bullet window warning and warning, can improve the security in chemical industry garden greatly.

The invention creatively provides a chemical safety assessment method based on an improved radial basis kernel Bayesian network, which is characterized in that a radial basis kernel Bayesian network is improved by combining a PCA algorithm and an entropy weight method, a model is established to predict data, a fuzzy logic algorithm is utilized to set a risk assessment judgment rule, the predicted data is combined with the judgment rule to carry out risk assessment, and the safety assessment is carried out on the assessed result, when the risk assessment grade is equal to grade 1 (wherein the safety grade is 0-1), warning and alarm prompt are carried out through a pop-up bullet frame, and as the data is predicted according to historical data, the potential problem of the data can be known, and the safety of a chemical industry park is greatly improved.

The chemical safety assessment method based on the improved radial basis function Bayesian network provided by the invention can be used in the chemical safety field and can also be applied to other data prediction fields.

The above description is only an example of the present invention and is not intended to limit the present invention. All equivalents and alternatives falling within the spirit of the invention are intended to be included within the scope of the invention. What is not elaborated on the invention belongs to the prior art which is known to the person skilled in the art.

Claims

1. The chemical safety assessment method based on the improved radial basis function Bayesian network is characterized by comprising the following steps of:

2. The chemical safety assessment method based on the improved radial basis function Bayesian network of claim 1, wherein the specific steps of S1 are as follows:

3. The chemical safety assessment method based on the improved radial basis function Bayesian network of claim 1, wherein the combination of the PCA algorithm and the entropy weight method in the step S2 comprises the following specific steps:

S2.1, performing Principal Component Analysis (PCA), performing centering treatment on an initial weight matrix T to obtain a centering feature matrix T _center, then calculating a feature covariance matrix C= (1/m) T _center^^T*T_center, wherein T represents transposition operation, m is the number of samples, and finally performing feature value decomposition on the covariance matrix C to obtain a feature value array T _value and a feature vector matrix T _vectors, and selecting the first k principal components, namely T _vectors [: k ] according to the descending order of feature values;

Calculating the ratio of each feature: p _ij＝T[i,j]/F_sum [ i ], wherein T [ i, j ] represents a single feature and F _sum [ i ] represents the sum of feature values;

Calculating entropy value of each feature: wherein i represents the ith feature, j represents the sample number, and m is the number of samples;

Calculating the weight of each feature: w _i＝H_i/sum (H), where W _i represents the weight of the ith feature, H _i represents the entropy of the ith feature, sum (H) represents the sum of the entropy of all features;

4. A chemical safety assessment method based on an improved radial basis function bayesian network according to claim 3, wherein the specific steps of S3 are as follows:

s3.1, calculating an activation value of an hidden layer:

S3.2, calculating the weight from the hidden layer to the output layer:

the weight w _j of the j-th neuron of the output layer is estimated by the least squares method: w _j＝∑(α_i*y_i*φ_i(x))/∑(α_i*φ_i(x))², where α _i is the weighting factor of the hidden layer neuron and y _i is the target output of the ith sample;

S3.3, predictive calculation of an output layer:

5. The chemical safety assessment method based on the improved radial basis function Bayesian network of claim 1, wherein the specific steps of S4 are as follows:

6. The chemical safety assessment method based on the improved radial basis function bayesian network according to claim 1, wherein the specific steps of training the model and predicting the data in S5 are as follows: