Disclosure of Invention
The invention aims to provide a method for improving the accuracy of analysis of soil heavy metal elements in XRF.
The invention adopts the technical scheme that the soil heavy metal quantitative analysis method based on Bayesian regularization comprises the following steps:
1) denoising and background subtraction are carried out on the X fluorescence spectrum sample of the soil sample by using discrete wavelets to obtain processed spectrum information;
2) calculating the component information of heavy metal elements and preset interference elements in the processed spectrum information by using a Compton normalization method, respectively using the component information of the heavy metal elements and the interference elements as the input of a BP neural network,
and the actual content of the corresponding heavy metal element is used as the output of the model;
the hyper-parameters for completing the training of the BP neural network are determined by Bayesian regularization, and the BP neural network adopts an error function of regularization correction as a target function F (W) in the training process, wherein F (W) is beta ED+αEW;
EDAs the original error function:
f(x
iw) is the output value of the BP neural network, t
iThe actual content t of the heavy metal element of the ith sample
iN is the total number of training samples, and W is the parameter vector of the neural network;
EWfor the decay function:
alpha and beta are hyper-parameters, and are determined by a Bayesian algorithm:
m is the total number of parameters of the BP neural network, gamma is the number of valid parameters, WMPIs the minimum point where the objective function f (w) is zero in gradient.
The Bayesian regularization correction target function can improve the generalization capability of the neural network, and the introduction of Bayesian regularization has multiple advantages: (1) the effective value of the network is ensured to be as small as possible under the condition that the network training error is as small as possible, which is equivalent to automatically reducing the scale of the network; (2) under the condition that the size of the training sample set is certain, the scale of the neural network is far smaller than that of the training sample, so that the over-training opportunity is reduced, and the generalization capability is improved. The invention adopts Bayesian algorithm to determine the hyperparameter, so that the hyperparameter can be adjusted in a self-adaptive manner in the network training to achieve the optimal size.
The invention has the beneficial effects that: according to the invention, a quantitative analysis method based on Bayesian regularization is adopted, the BP neural network improved through Bayesian regularization quantitatively analyzes the heavy metal elements in XRF, and the Bayesian algorithm is adopted to determine the hyperparameters, so that the size of the hyperparameters can be adjusted in a self-adaptive manner in the training of the network, and the hyperparameters are optimized, and further the generalization capability of the neural network is improved. Experimental data show that the method provided by the invention can effectively improve the accuracy of quantitative analysis of the heavy metal elements in the soil, and has advantages compared with the traditional BP neural network method.
Detailed Description
Considering that the X-ray fluorescence spectrum of the soil has serious background interference, firstly, denoising and background subtraction are carried out on an obtained X-ray fluorescence spectrum sample by using discrete wavelets. And calculating and processing component information of heavy metal elements and other elements which can possibly generate interference by a Compton normalization method, wherein the component information is respectively used as the input of the BP neural network improved by Bayesian regularization, the actual content of the corresponding heavy metal elements is used as the output of the model, and the regularization is to improve the generalization capability of the BP neural network through a target function. The method can ensure that the effective value of the network is as small as possible under the condition that the network training error is as small as possible, which is equivalent to automatically reducing the scale of the network, and under the condition that the size of the training sample set is certain, the scale of the neural network is far smaller than that of the training sample, so that the occurrence probability of over-training is reduced, the generalization capability is improved, and the accurate analysis of the heavy metal content of the soil is realized.
Based on the above thought, the embodiment provides a quantitative analysis method based on bayesian regularization, the workflow is shown in fig. 1, and the specific steps are as follows:
step 1: collecting X-ray fluorescence spectrum information of soil samples by an X-ray fluorescence spectrometer, and collecting X-ray fluorescence spectrum information g of n soil samples i1,2, n, the spectrogram of the soil sample is shown in fig. 2;
step 2: the discrete wavelet transform is used for removing noise and deducting background from the spectrum information to obtain processed spectrum information g1i,i=1,2,...,n;
And step 3: calculating and processing the composition information x of heavy metal elements and other several possibly interfering elements by a Compton normalization methodi,i=1,2,...,n;
And 4, step 4: the samples were divided into N training samples and (N-N) prediction samples, for which D ═ xi,ti) N, heavy metal elements and several other elements that may interfere with each other, i 1,2iCarrying out data standardization on variable data serving as input nodes, and carrying out data standardization on actual content t of corresponding heavy metal elementsiAs an output of the model;
and 5: determining the BP neural network structure, and initializing hyper-parameters alpha, beta and weight. After the first training step, the parameters of an objective function F (W) are restored to the initial setting, and the established objective function comprises an error function and an attenuation function of the network:
in the formula, EDAs an error function of the network, EWAs a decay function, f (x)iW) is the output value of the network, tiThe actual value is N, the total number of training samples is W, the parameter vector of the neural network is W, and the hyper-parameters alpha and beta are the distribution form of the control connection weight and the threshold value.
Step 6: and minimizing an object function F (W) by using a Levenberg-Marquardt algorithm, wherein a network weight iteration formula used in the training process is as follows:
Wh+1=Wh-[J(Wh)TJ(Wh)+μI]-1J(Wh)Te
in the formula, WhAnd f, the h iteration network weight vector, mu is an adaptive scalar, e is an error matrix, when mu is small, the Levenberg-Marquardt algorithm is changed into a Newton method, and when mu is large, the gradient descent method is adopted. The Levenberg-Marquardt algorithm is one of the existing optimization algorithms. The optimization is to find the parameter vector that minimizes the function value.
And 7: calculating the number gamma of effective parameters and the updated estimation values of the hyper-parameters alpha and beta of the objective function F (W);
wherein M is the number of the total network parameters, tr is the trace of the matrix,
is a vector differential operator, a Hessian matrix
Is obtained by approximation by a Gauss-Newton method, J is a Jacobian matrix of the training set sample errors,I
Mis E
WA jacobian matrix. The optimal values of the hyper-parameters alpha and beta are determined by a Bayesian algorithm, namely, the objective function F (W) is subjected to the minimum value point W with zero gradient
MPPerforming Taylor series expansion nearby, neglecting high-order terms, and performing Bayesian rule on W
MPPoint optimization regularization parameters:
and 8: repeating 6 and 7 until convergence;
and step 9: and completing parameter training by using the Bayesian regularization improved BP neural network, and testing a network model by using a prediction sample.
The following embodiments are further described with reference to specific examples, namely quantitative analysis of the heavy metal Pb in the soil by using a Bayesian regularization improved BP neural network model. In order to ensure the stability of soil spectrum data, a soil mode is selected for the spectrometer within an allowable environment temperature, after the optimal test parameters are adjusted, the spectrometer works under the light tube voltage of 45kV and the light tube current of 25uA during the test, the peak forming time of the multi-channel acquisition system is set to be 0.8us, and the test time is 90 s. In order to eliminate the influence of errors and other environmental factors in the testing process, the same soil sample is tested for 3 times and the average value is taken to obtain the final spectral data of the corresponding sample.
The first step is as follows: sequentially collecting X-ray fluorescence spectrum data g of the 57 national standard soil samples i1, 2.., 57. As can be seen from the figure, due to the existence of redundant information such as noise and a substrate, the identification difficulty of the characteristic peak is increased, so that the accurate calculation of the peak area is influenced, and the precision of quantitative analysis is reduced. Therefore, a coif3 wavelet is used for denoising, and a sym4 wavelet is used for background subtraction.
The second step is that: the target material of the spectrum analyzer is an Ag target, so the Compton normalization method is adopted, namely the component information of heavy metal element Pb and other elements (As, Cu and Bi) which can interfere the heavy metal element Pb and the heavy metal element Pb, which are obtained by dividing the counting of the target element Pb by the Compton counting of the Ag peak, forms a 57 multiplied by 4 element component information matrix.
The third step: the method comprises the steps of randomly dividing element component information into 45 parts of training samples and 12 parts of prediction samples, taking component information of Pb elements and related elements in a training set as input of a Bayesian regularization improved BP neural network model, taking Pb element content as output, determining parameter values of the model through preliminary experiments, namely completing training, and testing the network model by using the prediction samples.
And 5-9, finally obtaining the comparison between the predicted content and the actual content of the heavy metals in the Pb soil by using the Bayesian regularized improved BP neural network model. As can be seen from fig. 3, the predicted value and the actual value in the test sample set of the Pb model have a better contact ratio, which indicates that the BP neural network model improved by bayes regularization has higher accuracy, and is suitable for determining the content of heavy metal elements in soil, thereby proving the effectiveness of the embodiment.
To further illustrate the superiority of the method of the present invention, fig. 4 shows the comparison of the predicted values and actual values in the test sample set using a conventional gradient descent method-based BP neural network. Comparing with fig. 3, it is clear that the example works better in the determination of the content of heavy metal element Pb in soil. Meanwhile, the running time of the embodiment and the traditional BP neural network method based on the gradient descent method is 2.446 seconds and 3.212 seconds respectively, and the embodiment is more efficient to calculate and is about 1.313 times of the latter.