CN113866204A

CN113866204A - A method for quantitative analysis of heavy metals in soil based on Bayesian regularization

Info

Publication number: CN113866204A
Application number: CN202111132874.9A
Authority: CN
Inventors: 李福生; 程惠珠; 杨婉琪; 赵彦春; 曾小龙; 马骞
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2021-12-31

Abstract

The invention provides a method for quantitative analysis of heavy metals in soil based on Bayesian regularization. Discrete wavelets are used to denoise and background deduct the X fluorescence spectrum samples of soil samples to obtain processed spectral information; Compton normalization is used. The method calculates the composition information of heavy metal elements and preset interference elements in the spectral information after processing, takes the composition information of heavy metal elements and interference elements as the input of the BP neural network respectively, and uses the actual content of the corresponding heavy metal elements as the model Output; among them, the hyperparameters that complete the training of the BP neural network are determined by Bayesian regularization, and the BP neural network uses the regularized corrected error function as the objective function during the training process. The method proposed by the invention can effectively improve the accuracy of quantitative analysis of heavy metal elements in soil, and has advantages compared with the traditional BP neural network method.

Description

Bayesian regularization-based soil heavy metal quantitative analysis method

Technical Field

The invention relates to an X-ray fluorescence spectrum analysis technology, in particular to a quantitative analysis technology based on Bayesian regularization.

Background

Soil is one of the basic elements of an ecosystem, is closely related to grain safety, ecological safety and human health, and soil heavy metals become an important component of inorganic pollutants of soil due to the reasons of easy accumulation, high treatment difficulty, poor chemical stability and the like. Therefore, the method improves the accurate quantitative analysis of the heavy metals in the soil and has important significance for guiding pollution control. Among the methods for detecting the elemental composition content, the X-ray fluorescence spectrum XRF is widely used in various fields because of its rapidity, low cost, and suitability for large-area monitoring.

Since the content of heavy metals to be detected is usually very low, while the traditional XRF method can be interfered by the background and the signals of the neighboring elements when predicting heavy metal elements in soil, the characteristic peak of the element is more easily submerged in the background noise. Compared with the traditional method for researching the relationship between the element component information and the content, the BP neural network has strong robustness on noise data and the capability of approaching any nonlinear relationship. Because of the advantages, the accurate determination of the heavy metal content in the soil by using artificial intelligence algorithms such as a neural network becomes a research hotspot in the XRF analysis field, and the research is mainly focused on two points: searching a proper neural network data modeling method; how to design the analysis process and further optimize the modeling method.

The existing standard BP neural network algorithm model has the problems of overfitting, low convergence speed, local convergence and the like, and the mean square error is adopted as a target function of the model, so that the analysis effect of the heavy metal content of the soil is poor, and the requirement on accuracy is difficult to meet.

Disclosure of Invention

The invention aims to provide a method for improving the accuracy of analysis of soil heavy metal elements in XRF.

The invention adopts the technical scheme that the soil heavy metal quantitative analysis method based on Bayesian regularization comprises the following steps:

1) denoising and background subtraction are carried out on the X fluorescence spectrum sample of the soil sample by using discrete wavelets to obtain processed spectrum information;

2) calculating the component information of heavy metal elements and preset interference elements in the processed spectrum information by using a Compton normalization method, respectively using the component information of the heavy metal elements and the interference elements as the input of a BP neural network,

and the actual content of the corresponding heavy metal element is used as the output of the model;

the hyper-parameters for completing the training of the BP neural network are determined by Bayesian regularization, and the BP neural network adopts an error function of regularization correction as a target function F (W) in the training process, wherein F (W) is beta E_D+αE_W；

E_DAs the original error function:

f(x_iw) is the output value of the BP neural network, t_iThe actual content t of the heavy metal element of the ith sample_iN is the total number of training samples, and W is the parameter vector of the neural network;

E_Wfor the decay function:

alpha and beta are hyper-parameters, and are determined by a Bayesian algorithm:

m is the total number of parameters of the BP neural network, gamma is the number of valid parameters, W_MPIs the minimum point where the objective function f (w) is zero in gradient.

The Bayesian regularization correction target function can improve the generalization capability of the neural network, and the introduction of Bayesian regularization has multiple advantages: (1) the effective value of the network is ensured to be as small as possible under the condition that the network training error is as small as possible, which is equivalent to automatically reducing the scale of the network; (2) under the condition that the size of the training sample set is certain, the scale of the neural network is far smaller than that of the training sample, so that the over-training opportunity is reduced, and the generalization capability is improved. The invention adopts Bayesian algorithm to determine the hyperparameter, so that the hyperparameter can be adjusted in a self-adaptive manner in the network training to achieve the optimal size.

The invention has the beneficial effects that: according to the invention, a quantitative analysis method based on Bayesian regularization is adopted, the BP neural network improved through Bayesian regularization quantitatively analyzes the heavy metal elements in XRF, and the Bayesian algorithm is adopted to determine the hyperparameters, so that the size of the hyperparameters can be adjusted in a self-adaptive manner in the training of the network, and the hyperparameters are optimized, and further the generalization capability of the neural network is improved. Experimental data show that the method provided by the invention can effectively improve the accuracy of quantitative analysis of the heavy metal elements in the soil, and has advantages compared with the traditional BP neural network method.

Drawings

FIG. 1 is a flow chart of an embodiment;

FIG. 2 is a spectrum of a soil sample in the example;

FIG. 3 is a comparison between the predicted content and the actual content of lead Pb by the Bayesian regularization-improved BP neural network model in the embodiment;

FIG. 4 is a comparison between the predicted content and the actual content of Pb by the BP neural network based on the gradient descent method in the example.

Detailed Description

Considering that the X-ray fluorescence spectrum of the soil has serious background interference, firstly, denoising and background subtraction are carried out on an obtained X-ray fluorescence spectrum sample by using discrete wavelets. And calculating and processing component information of heavy metal elements and other elements which can possibly generate interference by a Compton normalization method, wherein the component information is respectively used as the input of the BP neural network improved by Bayesian regularization, the actual content of the corresponding heavy metal elements is used as the output of the model, and the regularization is to improve the generalization capability of the BP neural network through a target function. The method can ensure that the effective value of the network is as small as possible under the condition that the network training error is as small as possible, which is equivalent to automatically reducing the scale of the network, and under the condition that the size of the training sample set is certain, the scale of the neural network is far smaller than that of the training sample, so that the occurrence probability of over-training is reduced, the generalization capability is improved, and the accurate analysis of the heavy metal content of the soil is realized.

Based on the above thought, the embodiment provides a quantitative analysis method based on bayesian regularization, the workflow is shown in fig. 1, and the specific steps are as follows:

step 1: collecting X-ray fluorescence spectrum information of soil samples by an X-ray fluorescence spectrometer, and collecting X-ray fluorescence spectrum information g of

n soil samples

_i1,2, n, the spectrogram of the soil sample is shown in fig. 2;

step 2: the discrete wavelet transform is used for removing noise and deducting background from the spectrum information to obtain processed spectrum information g1_i,i＝1,2,...,n；

And step 3: calculating and processing the composition information x of heavy metal elements and other several possibly interfering elements by a Compton normalization method_i,i＝1,2,...,n；

And 4, step 4: the samples were divided into N training samples and (N-N) prediction samples, for which D ═ x_i,t_i) N, heavy metal elements and several other elements that may interfere with each other, i 1,2_iCarrying out data standardization on variable data serving as input nodes, and carrying out data standardization on actual content t of corresponding heavy metal elements_iAs an output of the model;

and 5: determining the BP neural network structure, and initializing hyper-parameters alpha, beta and weight. After the first training step, the parameters of an objective function F (W) are restored to the initial setting, and the established objective function comprises an error function and an attenuation function of the network:

in the formula, E_DAs an error function of the network, E_WAs a decay function, f (x)_iW) is the output value of the network, t_iThe actual value is N, the total number of training samples is W, the parameter vector of the neural network is W, and the hyper-parameters alpha and beta are the distribution form of the control connection weight and the threshold value.

Step 6: and minimizing an object function F (W) by using a Levenberg-Marquardt algorithm, wherein a network weight iteration formula used in the training process is as follows:

W_h+1＝W_h-[J(W_h)^TJ(W_h)+μI]^-1J(W_h)^Te

in the formula, W_hAnd f, the h iteration network weight vector, mu is an adaptive scalar, e is an error matrix, when mu is small, the Levenberg-Marquardt algorithm is changed into a Newton method, and when mu is large, the gradient descent method is adopted. The Levenberg-Marquardt algorithm is one of the existing optimization algorithms. The optimization is to find the parameter vector that minimizes the function value.

And 7: calculating the number gamma of effective parameters and the updated estimation values of the hyper-parameters alpha and beta of the objective function F (W);

wherein M is the number of the total network parameters, tr is the trace of the matrix,

is a vector differential operator, a Hessian matrix

Is obtained by approximation by a Gauss-Newton method, J is a Jacobian matrix of the training set sample errors,I_Mis E_WA jacobian matrix. The optimal values of the hyper-parameters alpha and beta are determined by a Bayesian algorithm, namely, the objective function F (W) is subjected to the minimum value point W with zero gradient_MPPerforming Taylor series expansion nearby, neglecting high-order terms, and performing Bayesian rule on W_MPPoint optimization regularization parameters:

and 8: repeating 6 and 7 until convergence;

and step 9: and completing parameter training by using the Bayesian regularization improved BP neural network, and testing a network model by using a prediction sample.

The following embodiments are further described with reference to specific examples, namely quantitative analysis of the heavy metal Pb in the soil by using a Bayesian regularization improved BP neural network model. In order to ensure the stability of soil spectrum data, a soil mode is selected for the spectrometer within an allowable environment temperature, after the optimal test parameters are adjusted, the spectrometer works under the light tube voltage of 45kV and the light tube current of 25uA during the test, the peak forming time of the multi-channel acquisition system is set to be 0.8us, and the test time is 90 s. In order to eliminate the influence of errors and other environmental factors in the testing process, the same soil sample is tested for 3 times and the average value is taken to obtain the final spectral data of the corresponding sample.

The first step is as follows: sequentially collecting X-ray fluorescence spectrum data g of the 57 national

standard soil samples

_i1, 2.., 57. As can be seen from the figure, due to the existence of redundant information such as noise and a substrate, the identification difficulty of the characteristic peak is increased, so that the accurate calculation of the peak area is influenced, and the precision of quantitative analysis is reduced. Therefore, a coif3 wavelet is used for denoising, and a sym4 wavelet is used for background subtraction.

The second step is that: the target material of the spectrum analyzer is an Ag target, so the Compton normalization method is adopted, namely the component information of heavy metal element Pb and other elements (As, Cu and Bi) which can interfere the heavy metal element Pb and the heavy metal element Pb, which are obtained by dividing the counting of the target element Pb by the Compton counting of the Ag peak, forms a 57 multiplied by 4 element component information matrix.

The third step: the method comprises the steps of randomly dividing element component information into 45 parts of training samples and 12 parts of prediction samples, taking component information of Pb elements and related elements in a training set as input of a Bayesian regularization improved BP neural network model, taking Pb element content as output, determining parameter values of the model through preliminary experiments, namely completing training, and testing the network model by using the prediction samples.

And 5-9, finally obtaining the comparison between the predicted content and the actual content of the heavy metals in the Pb soil by using the Bayesian regularized improved BP neural network model. As can be seen from fig. 3, the predicted value and the actual value in the test sample set of the Pb model have a better contact ratio, which indicates that the BP neural network model improved by bayes regularization has higher accuracy, and is suitable for determining the content of heavy metal elements in soil, thereby proving the effectiveness of the embodiment.

To further illustrate the superiority of the method of the present invention, fig. 4 shows the comparison of the predicted values and actual values in the test sample set using a conventional gradient descent method-based BP neural network. Comparing with fig. 3, it is clear that the example works better in the determination of the content of heavy metal element Pb in soil. Meanwhile, the running time of the embodiment and the traditional BP neural network method based on the gradient descent method is 2.446 seconds and 3.212 seconds respectively, and the embodiment is more efficient to calculate and is about 1.313 times of the latter.

Claims

1. a method for quantitative analysis of heavy metals in soil based on Bayesian regularization, is characterized in that, comprises the following steps:

1) Use discrete wavelets to denoise and denoise the X-fluorescence spectrum samples of soil samples to obtain the processed spectral information;

2) Compton normalization method is used to calculate the composition information of heavy metal elements and preset interference elements in the processed spectral information, and the composition information of heavy metal elements and interference elements are used as the input of the BP neural network respectively, and the The actual content of the corresponding heavy metal elements is used as the output of the model;

Wherein, the hyperparameter that completes the BP neural network training is determined by Bayesian regularization, and in the training process, the BP neural network adopts the error function of regularization correction as the objective function F(W), F(W)=βE _D +αE _W ;

_ED is the original error function:

f(x _i ,W) is the output value of the BP neural network, t _i is the actual content t _i of the heavy metal element in the ith sample, N is the total number of training samples, and W is the parameter vector of the neural network;

E _W is the decay function:

α, β are hyperparameters, determined by Bayesian algorithm:

M is the total number of parameters of the BP neural network, γ is the number of effective parameters, W _MP is the minimum point of the objective function F(W) at zero gradient.