CN117074333B - COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum - Google Patents
COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum Download PDFInfo
- Publication number
- CN117074333B CN117074333B CN202310628547.5A CN202310628547A CN117074333B CN 117074333 B CN117074333 B CN 117074333B CN 202310628547 A CN202310628547 A CN 202310628547A CN 117074333 B CN117074333 B CN 117074333B
- Authority
- CN
- China
- Prior art keywords
- spectrum
- spectrum data
- constructing
- prediction model
- cod
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000862 absorption spectrum Methods 0.000 title claims abstract description 39
- 238000005259 measurement Methods 0.000 title claims abstract description 38
- 230000031700 light absorption Effects 0.000 title claims abstract description 29
- 238000010276 construction Methods 0.000 title abstract description 7
- 238000001228 spectrum Methods 0.000 claims abstract description 103
- 239000000523 sample Substances 0.000 claims abstract description 95
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 75
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 32
- 238000002835 absorbance Methods 0.000 claims abstract description 31
- 230000003595 spectral effect Effects 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000007689 inspection Methods 0.000 claims abstract description 8
- 238000012795 verification Methods 0.000 claims abstract description 8
- 238000012216 screening Methods 0.000 claims abstract description 7
- 230000002452 interceptive effect Effects 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims description 28
- 241000254173 Coleoptera Species 0.000 claims description 26
- 210000003608 fece Anatomy 0.000 claims description 26
- 238000005070 sampling Methods 0.000 claims description 16
- 230000006399 behavior Effects 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 7
- 238000013112 stability test Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 239000003795 chemical substances by application Substances 0.000 claims description 6
- 238000002790 cross-validation Methods 0.000 claims description 6
- 238000012952 Resampling Methods 0.000 claims description 4
- 238000005096 rolling process Methods 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 3
- 230000019637 foraging behavior Effects 0.000 claims description 3
- 239000002245 particle Substances 0.000 claims description 3
- 230000014759 maintenance of location Effects 0.000 claims description 2
- 238000012544 monitoring process Methods 0.000 abstract description 7
- 239000000470 constituent Substances 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 12
- 238000001514 detection method Methods 0.000 description 7
- 239000003153 chemical reaction reagent Substances 0.000 description 4
- 238000013178 mathematical model Methods 0.000 description 4
- 238000010521 absorption reaction Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 230000000670 limiting effect Effects 0.000 description 3
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 238000002211 ultraviolet spectrum Methods 0.000 description 2
- 239000004471 Glycine Substances 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- KJONHKAYOJNZEC-UHFFFAOYSA-N nitrazepam Chemical compound C12=CC([N+](=O)[O-])=CC=C2NC(=O)CN=C1C1=CC=CC=C1 KJONHKAYOJNZEC-UHFFFAOYSA-N 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- IWZKICVEHNUQTL-UHFFFAOYSA-M potassium hydrogen phthalate Chemical compound [K+].OC(=O)C1=CC=CC=C1C([O-])=O IWZKICVEHNUQTL-UHFFFAOYSA-M 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000003911 water pollution Methods 0.000 description 1
Abstract
The invention discloses a COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum, which relates to the technical field of water quality monitoring and comprises the following steps: performing repeatability and stability inspection on absorbance of the full wave band of the spectral probe to obtain the spectral probe qualified in inspection; collecting absorption spectrums of different water samples in all wave bands as spectrum data of the water samples through a qualified spectrum probe, and constructing a water sample through the spectrum data of the water samples and concentration values of the corresponding water samples to form a database; variable screening is carried out on the water sample spectrum data through a CARS algorithm, and a training sample set of a prediction model is formed by selecting a variable combination with the lowest interactive verification root mean square error; an LSSVR prediction model is constructed based on a training sample set, parameters of the LSSVR prediction model are optimized based on a DBO algorithm, a COD soft measurement model based on an ultraviolet-visible light absorption spectrum is formed, and the COD soft measurement model with good precision and generalization capability is provided.
Description
Technical Field
The invention relates to the technical field of water quality monitoring, in particular to a COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum.
Background
COD (Chemical Oxygen Demand) is one of important indexes of water pollution, and has important significance for water quality monitoring and treatment. Soft measurement is a real-time monitoring technology based on mathematical model and data analysis, and has the advantages of reagent-free, real-time, accuracy, economy and the like. However, due to the problems of noise, nonlinearity and the like of COD detection data, the accuracy and generalization capability of a COD soft measurement model are poor, and a traditional on-site sampling-chemical reagent analysis COD detection method is generally adopted at present.
In view of the above, the inventor provides a method for constructing a soft COD measurement model based on ultraviolet-visible light absorption spectrum, which improves the precision and generalization capability of the soft COD measurement model and replaces the traditional on-site sampling-chemical reagent analysis of COD detection.
Disclosure of Invention
The application aims to provide a COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum, which solves the problems in the background technology.
The technical aim of the application is realized by the following technical scheme: comprising
S1, carrying out repeatability and stability inspection on absorbance of the full wave band of the spectral probe to obtain the spectral probe qualified in inspection;
s2, collecting absorption spectrums of different water samples in all wave bands through a qualified spectrum probe to serve as spectrum data of the water samples, constructing a water sample through the spectrum data of the water samples and concentration values corresponding to the water samples, and constructing a database through a plurality of groups of water sample;
s3, variable screening is carried out on the water sample spectrum data in the database through a CARS algorithm, and a training sample set with the lowest interactive verification root mean square error variable combination is selected to form a prediction model;
S4, constructing an LSSVR prediction model based on the training sample set, optimizing parameters of the LSSVR prediction model based on a DBO algorithm, and forming a COD soft measurement model based on an ultraviolet-visible light absorption spectrum.
By adopting the technical scheme, the availability of the collected data of the spectrum probe is ensured through the Pearson coefficient, and the spectrum data and the concentration values of water samples with different components are collected to construct a database; extracting absorbance corresponding to characteristic wavelength in a band of interest in spectrum data by using a CARS algorithm as input of a prediction model; the LSSVR algorithm is selected in the main modeling mode, the DBO algorithm is used for self-adaptive optimization of the super parameters in the LSSVR algorithm, and the model accuracy is enhanced; the COD soft measurement model constructed by the method has good precision and generalization capability, can replace the traditional on-site sampling-chemical reagent analysis COD detection method, and can be used for on-line monitoring of water quality.
Further, the step S1 includes:
Collecting multiple full-band spectrum data of the same sample at the same moment through a spectrum probe, calculating the pearson coefficient of the multiple full-band spectrum data, and when the pearson coefficient meets the correlation requirement, considering that the spectrum probe passes the absorbance repeatability test;
collecting multiple full-band spectrum data of the same sample in a period of time through a spectrum probe, calculating the pearson coefficient of the multiple full-band spectrum data, and when the pearson coefficient meets the correlation requirement, considering that the spectrum probe passes the absorbance stability test;
and taking the spectrum probe which passes the absorbance repeatability test and the absorbance stability test simultaneously as a spectrum probe which is qualified in test.
Further, the pearson coefficient r is expressed as:
Wherein x i and y i represent two different sets of full band spectral data, And (3) withThe average values of x i and y i are respectively expressed, r represents the pearson coefficient, and when the r value is larger than 0.99, the pearson coefficients of the two groups of full-band spectrum data are considered to meet the correlation requirement.
Further, the step S2 includes:
And collecting spectrum data of water samples of different components through a qualified spectrum probe, collecting multiple spectrum data of the water samples of the same component, taking an average value as spectrum data of the water samples, wherein the spectrum data is absorption spectrum data of the water samples in a full wave band, denoising the spectrum data, combining the spectrum data with concentration values corresponding to the water samples to form water sample samples, and constructing a database through multiple groups of water sample samples.
Further, the step S3 includes:
S31, randomly selecting a certain amount of spectrum data from a database each time as a correction set to establish a PLS model, and taking the percentage of the absolute value of a regression coefficient in the PLS model of each sampling process as an importance index of a variable, wherein the variable refers to the data of absorbance of a certain characteristic wavelength in the spectrum data;
s32, removing variables with relatively smaller absolute values of regression coefficients by using an exponential decay function to obtain a reserved variable proportion;
s33, determining the number of sampling variables according to the ratio of the reserved variables, carrying out resampling PLS modeling, and calculating the cross verification root mean square error;
And S34, performing loop calculation according to the set loop iteration times to obtain a plurality of variable combinations and a plurality of cross-validation root mean square errors, and selecting the variable combination with the minimum cross-validation root mean square error as a training sample set of the prediction model.
Further, it is characterized in that: when the PLS model is built in the ith sampling, the retention variable ratio r i is:
ri=ae-ki
a=(P/2)1/(N-1)
k=ln(P/2)/(N-1)
Wherein r i is the ratio of reserved variables in the ith sampling, i epsilon (0, N) is the number of loop iterations, N is the maximum number of loop iterations, a and k are constants, and P is all variables.
Further, the step S4 includes:
S41, constructing a prediction model function according to a training sample set of the prediction model, and constructing a kernel function and a bias term to obtain an LSSVR prediction model;
S42, optimizing gamma and sigma parameters in the LSSVR prediction model kernel function through a DBO algorithm to obtain the COD soft measurement model based on the ultraviolet-visible light absorption spectrum.
Further, the prediction model function is:
wherein f (x) is a COD concentration predicted value, x is an input variable, Ω and b are weight terms and bias terms as nonlinear approximation functions.
Further, the LSSVR prediction model is:
Wherein Y is a COD concentration predicted value, k (x, x i) is a kernel function, alpha i is a Lagrangian multiplier, and b is a bias term.
Further, the step S42 includes:
S421, determining a search range of gamma and sigma parameters in the kernel function;
S422, initializing a certain number of dung beetles, wherein each dung beetle represents a combination of gamma and sigma, and initial values of the gamma and the sigma are randomly generated in a determined parameter range;
S423, carrying out position update on the initially generated dung beetle population according to formulas of rolling ball behaviors, dancing behaviors, propagation behaviors, foraging behaviors and stealing behaviors;
S424, fitting sample data of individual dung beetles by adopting an LSSVR model, and calculating an average square error as the fitness of the individual dung beetles;
s425, updating the identity and the position of the dung beetle agent: updating each individual position of the dung beetles by adopting an updating rule of the DBO agent identity according to the current individual position information of the dung beetles;
s426, repeating the steps S423 to S425 until the preset iteration times are reached or the preset precision requirement is met;
S427, finally selecting particles with the minimum fitness value as an optimal solution, wherein the corresponding gamma and sigma are gamma and sigma parameters in a final LSSVR prediction model kernel function.
Compared with the prior art, the application has the following beneficial effects: according to the method for constructing the COD soft measurement model based on the ultraviolet-visible light absorption spectrum, provided by the application, the repeatability and stability of absorbance of the whole wave band of the spectrum probe are checked through the Pearson coefficient, so that the availability of collected data is ensured; the absorbance data of a plurality of specific wavelengths are screened through the CARS algorithm and are input as an LSSVR model, so that the fitting precision of the COD concentration value is improved, and the COD concentration value is reflected with better information interpretation; and the LSSVR prediction model is optimized by combining the global exploratory and rapid convergence advantages of the DBO algorithm, so that a COD soft measurement model with good precision and generalization capability is formed.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings:
FIG. 1 is a flow chart of a method according to an embodiment of the invention;
FIG. 2 is a graph showing the pearson coefficient results of absorbance reproducibility analysis according to one embodiment of the invention;
FIG. 3 is a diagram showing the results of Pelson coefficient analysis for absorbance stability according to one embodiment of the invention;
FIG. 4 is a diagram illustrating a CARS algorithm according to an embodiment of the present invention for screening a characteristic wavelength;
FIG. 5 is a graph showing the fitting effect of the real values and the predicted values in the model training according to an embodiment of the present invention;
FIG. 6 is a fitting equation of the true value and the predicted value of the model according to an embodiment of the present invention;
fig. 7 is a flowchart of an algorithm implemented by the method according to an embodiment of the present invention.
Detailed Description
Hereinafter, the terms "comprises" or "comprising" as may be used in various embodiments of the present application indicate the presence of the claimed function, operation or element, and are not limiting of the increase of one or more functions, operations or elements. Furthermore, as used in various embodiments of the application, the terms "comprises," "comprising," and their cognate terms are intended to refer to a particular feature, number, step, operation, element, component, or combination of the foregoing, and should not be interpreted as first excluding the existence of or increasing likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
In various embodiments of the application, the expression "or" at least one of B or/and C "includes any or all combinations of the words listed simultaneously. For example, the expression "B or C" or "at least one of B or/and C" may include B, may include C or may include both B and C.
Expressions (such as "first", "second", etc.) used in the various embodiments of the application may modify various constituent elements in the various embodiments, but the respective constituent elements may not be limited. For example, the above description does not limit the order and/or importance of the elements. The above description is only intended to distinguish one element from another element. For example, the first user device and the second user device indicate different user devices, although both are user devices. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of various embodiments of the present application.
It should be noted that: if it is described that one constituent element is "connected" to another constituent element or "connected" with another constituent element, a first constituent element may be directly connected to a second constituent element, and a third constituent element may be "connected" between the first constituent element and the second constituent element. Conversely, when one constituent element is "directly connected" to another constituent element or "directly connected" with another constituent element, it is understood that there is no third constituent element between the first constituent element and the second constituent element.
The terminology used in the various embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the various embodiments of the application. As used herein, the singular is intended to include the plural as well, unless the context clearly indicates otherwise. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the application.
For the purpose of making apparent the objects, technical solutions and advantages of the present application, the present application will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present application and the descriptions thereof are for illustrating the present application only and are not to be construed as limiting the present application.
The COD soft measurement based on the ultraviolet-visible light absorption spectrum refers to the prediction of the COD concentration value through the ultraviolet-visible light absorption spectrum by using a mathematical model, and the accuracy and generalization capability of the current COD soft measurement model are poor due to the problems of noise and nonlinearity of the input spectrum data.
In view of the above, the inventor proposes a method for constructing a COD soft measurement model based on ultraviolet-visible light absorption spectrum, a reasonable feature extraction algorithm is adopted in an input stage, noise data is filtered while the dimension of model input is reduced, and a mathematical model with good prediction precision and interpretability, strong robustness to abnormal values and wide practical application is adopted in a prediction stage in spectral modeling analysis; meanwhile, an intelligent algorithm with high convergence speed and strong global searching capability and suitable for optimizing various functions is adopted, the problems of nonlinearity, multiple peaks, multiple dimensions and the like in model construction are solved, and a high-precision prediction model with strong generalization capability is comprehensively constructed through the consideration. The method for constructing the COD soft measurement model based on the ultraviolet-visible light absorption spectrum is described below with reference to examples and drawings.
Example 1
The embodiment provides a COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum, and aims to establish a least square support vector regression model for ensuring available ultraviolet spectrum data and COD values, obtain a mathematical model with errors within the allowable range of national standards and industry standards through model training and error calculation, carry out COD soft measurement, improve extrapolation capability of the model, simultaneously optimize model parameters by combining a DBO algorithm, improve convergence, enhance generalization capability of the model, train a prediction model by establishing databases by water samples with different components and concentrations in different areas, and finish inversion of absorption spectrum COD.
In order to achieve the above objective, the method for constructing the COD soft measurement model based on the ultraviolet-visible light absorption spectrum of this embodiment, please refer to fig. 1, mainly includes four parts:
s1, carrying out repeatability and stability inspection on absorbance of the full wave band of the spectral probe to obtain the spectral probe qualified in inspection;
s2, collecting absorption spectrums of different water samples in all wave bands through a qualified spectrum probe to serve as spectrum data of the water samples, constructing a water sample through the spectrum data of the water samples and concentration values corresponding to the water samples, and constructing a database through a plurality of groups of water sample;
s3, variable screening is carried out on the water sample spectrum data in the database through a CARS algorithm, and a training sample set with the lowest interactive verification root mean square error variable combination is selected to form a prediction model;
S4, constructing an LSSVR prediction model based on the training sample set, optimizing parameters of the LSSVR prediction model based on a DBO algorithm, and forming a COD soft measurement model based on an ultraviolet-visible light absorption spectrum.
Step S1 is described below, and step S1 is to perform repeatability and stability test on absorbance of the whole band of the spectroscopic probe, so as to obtain the spectroscopic probe that is qualified in test. The spectrum data of various COD contributions are collected through the spectrum probe, spectrum data in different states are obtained through controlling factors such as temperature, concentration and collection time, the pearson coefficient analysis is carried out on the repeatability and the stability of the spectrum probe, and the availability of the spectrum data obtained through the spectrum probe is ensured.
In this example, the step S1 includes:
Collecting multiple full-band spectrum data of the same sample at the same moment through a spectrum probe, calculating the pearson coefficient of the multiple full-band spectrum data, and when the pearson coefficient meets the correlation requirement, considering that the spectrum probe passes the absorbance repeatability test;
collecting multiple full-band spectrum data of the same sample in a period of time through a spectrum probe, calculating the pearson coefficient of the multiple full-band spectrum data, and when the pearson coefficient meets the correlation requirement, considering that the spectrum probe passes the absorbance stability test;
And taking the spectrum probe which passes the absorbance repeatability test and the absorbance stability test simultaneously as a spectrum probe which is qualified in test. The pearson coefficient r described above is expressed as:
Wherein x i and y i represent two different sets of full band spectral data, And (3) withThe average values of x i and y i are respectively expressed, r represents the pearson coefficient, and when the r value is larger than 0.99, the pearson coefficients of the two groups of full-band spectrum data are considered to meet the correlation requirement.
For specific implementation of this example, please refer to FIGS. 2-3 for absorbance reproducibility test: the sample is 15mg potassium hydrogen phthalate, 46 minutes is carried out at the same time, for example, 10 minutes is carried out on 22 days of 09 months of 2022, and the spectrum data of 10 times are compared and analyzed for the Pirson coefficient, and the result is shown in figure 2; absorbance stability assay: the sample was 0.3g glycine and was absorbed for a period of time, e.g., 22 nd month 2022, 9 th, 10 th, 11 th, 13 th, 14 th, 15 th, 16 th, 17 th, 18 th. The 9-time measurement spectrum data are compared with each other to analyze the pearson coefficient, and the result is shown in fig. 3 in detail.
Step S2 is described below, wherein the step S2 is to collect absorption spectrums of different water samples in all wave bands through a qualified spectrum probe to serve as spectrum data of the water samples, construct water samples through the spectrum data of the water samples and concentration values corresponding to the water samples, and construct a database through multiple groups of water samples. The method is characterized in that a spectrum probe is utilized to collect spectrum data of water samples with various components according to the operation flow, a mode of measuring for multiple times to obtain an average value is adopted for each collection, the spectrum data is subjected to denoising pretreatment and then is used as a data source of the database to construct the database, and meanwhile, data in the database are divided into a model training set and a model testing set according to a certain proportion, so that the model training data source is standardized.
In this example, the step S2 includes:
And collecting spectrum data of water samples of different components through a qualified spectrum probe, collecting multiple spectrum data of the water samples of the same component, taking an average value as spectrum data of the water samples, wherein the spectrum data is absorption spectrum data of the water samples in a full wave band, denoising the spectrum data, combining the spectrum data with concentration values corresponding to the water samples to form water sample samples, and constructing a database through multiple groups of water sample samples.
In the embodiment, a qualified spectral probe is adopted to collect spectral data of water samples of different components in different regions according to the use specifications, the spectral data is absorption spectral data of the water samples in the whole wave band of 194 nm-1054 nm, for example, 10 groups of spectral data are collected for the water samples of the same component, an average value of the 10 groups of spectral data is taken as the spectral data of the water samples of the component, the concentration value of the water samples of the component is correspondingly obtained, the spectral data and the concentration value of the water samples of the component are combined and stored as water samples in the spectral data of the corresponding region, a spectral database is constructed, the water samples in the spectral database are (x i,yi)∈Rn×Rm,xi is absorption spectral data (absorbance data) of all the water samples, and y i is COD concentration value of the water samples.
Step S3 is described below, and in step S3, variable screening is performed on the water sample spectrum data in the database through CARS algorithm, and a training sample set with the lowest interactive verification root mean square error variable combination is selected to form a prediction model. The method aims at carrying out dimension reduction on spectrum data with strong nonlinear relation by using a CARS algorithm, adopting a self-adaptive weighting strategy in the dimension reduction process, carrying out weighting treatment in a self-adaptive manner according to the competition condition of variables, selecting an optimal characteristic variable combination to carry out dimension reduction, and reducing the interference of noise while keeping important information in input information. In addition, the CARS algorithm returns a set of spectrum characteristic wavelength original data in the implementation process of the computer program, and can be used as a follow-up interference factor elimination analysis and the like, so that the phenomenon of a black box is reduced.
In this example, the step S3 includes:
S31, randomly selecting a certain amount of spectrum data from a database each time as a correction set to establish a PLS model, and taking the percentage of the absolute value of a regression coefficient in the PLS model of each sampling process as an importance index of a variable, wherein the variable refers to the data of absorbance of a certain characteristic wavelength in the spectrum data;
s32, removing variables with relatively smaller absolute values of regression coefficients by using an exponential decay function to obtain a reserved variable proportion;
s33, determining the number of sampling variables according to the ratio of the reserved variables, carrying out resampling PLS modeling, and calculating the cross verification root mean square error;
And S34, performing loop calculation according to the set loop iteration times to obtain a plurality of variable combinations and a plurality of cross-validation root mean square errors, and selecting the variable combination with the minimum cross-validation root mean square error as a training sample set of the prediction model.
In step S32, when the PLS model is built in the ith sampling, the retained variable ratio r i is:
ri=ae-ki
a=(P/2)1/(N-1)
k=ln(P/2)/(N-1)
Wherein r i is the ratio of reserved variables in the ith sampling, i epsilon (0, N) is the number of loop iterations, N is the maximum number of loop iterations, a and k are constants, and P is all variables.
In this embodiment, the following steps are performed:
A. Monte Carlo model sampling: the data set is randomly divided from the spectrum data of the database for modeling analysis, the division ratio is 70% -90%, PLS is selected for modeling analysis, and the percentage of the absolute value of the regression coefficient is used as the importance of the variable or the interpretation of the target variable.
B. Exponentially decaying wavelength selection: modeling is carried out by adopting all variables P for the first time, the variables determined in the N iterative processes are gradually decreased, the variable number ratio r i determined by the ith sampling is determined according to the following formula:
ri=ae-ki
Since the first sample r 1 is modeled using all variables P and the nth sample r N is modeled using two variables, the constraint of the above equation is r 1=P,rN =2/P, and the exponential decreasing function constants a and k can be found as:
a=(P/2)1/(N-1)
k=ln(p/2)/(N-1)
C. Adaptive re-weighted sampling: and (3) removing the iteration sampling variable determined in the process B, namely resampling, wherein the weight represents the occurrence frequency of the variable, then establishing an analysis prediction model based on the screening variable, and adopting RMSECV for verifying the effectiveness of the model.
D. and (3) loop iteration: and performing loop calculation according to the set loop iteration times, and locking the required characteristic variable according to the minimum RMSECV. The location of the characteristic wavelength is shown in fig. 4.
Step S4 is described below, where an LSSVR prediction model is constructed based on the training sample set, and parameters of the LSSVR prediction model are optimized based on a DBO algorithm to form a COD soft measurement model based on an ultraviolet-visible light absorption spectrum. The method aims at adopting LSSVR with wide use and strong popularization capability as a prediction model, analyzing the corresponding relation between ultraviolet spectrum data and COD concentration value, adopting DBO algorithm to simulate the behavior of dung beetle when rolling dung beetle to find the optimal solution of LSSVR prediction model parameters, so that the LSSVR prediction model has stronger global searching capability, faster convergence speed and better robustness.
Specifically, in this example, step S4 includes:
S41, constructing a prediction model function according to a training sample set of the prediction model, and constructing a kernel function and a bias term to obtain an LSSVR prediction model;
S42, optimizing gamma and sigma parameters of a kernel function in the LSSVR prediction model through a DBO algorithm to obtain the COD soft measurement model based on the ultraviolet-visible light absorption spectrum.
The following describes step S41, which illustrates the process of generating the LSSVR prediction model:
the training sample set screened by CARS algorithm is: where N is the total number of samples, x i is the sample data with sequence number i, Based on a real space of n dimensions, a sample label with a sequence number of y i is used for constructing a prediction model function in the following form based on a Regression (Regression) support vector machine:
wherein f (x) is the COD concentration value; x is an input variable; Is a nonlinear approximation function; omega and b are weight and bias terms. The following operations are then performed:
A. Constructing an objective function of an LSSVR optimization problem:
Wherein, r is the number of the components, The regularization coefficients are squared with the error term.
B. the inequality constraint of the SVM is changed into equality constraint:
C. constructing a Lagrange function:
Where α i is the Lagrangian multiplier.
D. according to the KKT condition, the constraint condition that the partial derivative of each variable is 0 should be satisfied when the optimal value is taken:
elimination of ω and ζ yields a linear system:
Where ZZ T and I represent the kernel matrix and the identity matrix.
E. in the system of linear equations of the above formula:
F. Let a=zz T+r-1 I, the solution of the above linear system of equations can be found, and the LSSVR prediction model can be obtained.
Wherein Y is a COD concentration predicted value, k (x, x i) is a kernel function, and a Radial Basis Function (RBF) is selected as the kernel function in the example.
Step S41 is described below, which describes a specific process of searching for an optimal solution of LSSVR prediction model parameters by using the DBO algorithm, and step S42 includes:
S421, determining a search range of gamma and sigma parameters in the kernel function;
S422, initializing a certain number of dung beetles, wherein each dung beetle represents a combination of gamma and sigma, and initial values of the gamma and the sigma are randomly generated in a determined parameter range;
S423, carrying out position update on the initially generated dung beetle population according to formulas of rolling ball behaviors, dancing behaviors, propagation behaviors, foraging behaviors and stealing behaviors;
S424, fitting sample data of individual dung beetles by adopting an LSSVR model, and calculating an average square error as the fitness of the individual dung beetles;
s425, updating the identity and the position of the dung beetle agent: updating each individual position of the dung beetles by adopting an updating rule of the DBO agent identity according to the current individual position information of the dung beetles;
s426, repeating the steps S423 to S425 until the preset iteration times are reached or the preset precision requirement is met;
S427, finally selecting particles with the minimum fitness value as an optimal solution, wherein the corresponding gamma and sigma are gamma and sigma parameters in a final LSSVR prediction model kernel function.
The comprehensive flow of the method is shown in fig. 7, and the COD concentration calculation can be performed by acquiring new spectrum data based on the COD soft measurement model based on the ultraviolet-visible light absorption spectrum constructed in the steps S1-S4. Fig. 5 and 6 show that the prediction effect of the COD soft measurement model constructed by the method has no obvious difference compared with the detection data of the traditional on-site sampling-chemical reagent analysis, which indicates that the detection method of the invention has high accuracy, can replace the traditional COD detection method, and meanwhile, the COD soft measurement model obtained by the method can also be used for on-line monitoring of water quality.
The method is based on the Langmuir Boby's quantitative analysis principle, utilizes the law that the absorbance spectrum obtained by the ultraviolet-visible light absorption spectrum in the full spectrum has the relation between the absorbance of partial wavelengths and the COD concentration value, and utilizes the national standard method of a water sample to detect the COD value as a target. Firstly, detecting the absorbance repeatability and stability of the full wave band of the spectrum probe through the Pearson coefficient, and ensuring the availability of the acquired data; the absorbance data of a plurality of specific wavelengths are screened through the CARS algorithm and are input as an LSSVR model, so that the fitting precision of the COD concentration value is improved, and the COD concentration value is reflected with better information interpretation; and the LSSVR prediction model is optimized by combining the advantages of global exploratory property and rapid convergence of the DBO algorithm, and errors after model training are calculated on the established model. And setting the iteration times of the DBO algorithm to enable the model to reach the target accuracy and precision or error index. And aiming at monitoring different water qualities, selecting a data model of the area river basin to calculate the spectrum absorption COD.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (6)
1. The method for constructing the COD soft measurement model based on the ultraviolet-visible light absorption spectrum is characterized by comprising the following steps of: comprising the following steps:
s1, carrying out repeatability and stability inspection on absorbance of the full wave band of the spectral probe to obtain the spectral probe qualified in inspection;
s2, collecting absorption spectrums of different water samples in all wave bands through a qualified spectrum probe to serve as spectrum data of the water samples, constructing a water sample through the spectrum data of the water samples and concentration values corresponding to the water samples, and constructing a database through a plurality of groups of water sample;
s3, variable screening is carried out on the water sample spectrum data in the database through a CARS algorithm, and a training sample set with the lowest interactive verification root mean square error variable combination is selected to form a prediction model;
S4, constructing an LSSVR prediction model based on the training sample set, optimizing parameters of the LSSVR prediction model based on a DBO algorithm, and forming a COD soft measurement model based on an ultraviolet-visible light absorption spectrum;
Wherein, the step S4 includes: s41, constructing a prediction model function according to a training sample set of the prediction model, and constructing a kernel function and a bias term to obtain an LSSVR prediction model; s42, optimizing gamma and sigma parameters of a kernel function in the LSSVR prediction model through a DBO algorithm to obtain a COD soft measurement model based on an ultraviolet-visible light absorption spectrum;
the prediction model function is as follows: wherein f (x) is a COD concentration predicted value, x is an input variable, Omega and b are weight terms and bias terms for nonlinear approximation functions;
the LSSVR prediction model is as follows: Wherein Y is a COD concentration predicted value, k (x, x i) is a kernel function, a radial basis function is selected as the kernel function, alpha i is a Lagrangian multiplier, and b is a bias term;
The step S42 includes: s421, determining a search range of gamma and sigma parameters in the kernel function; s422, initializing a certain number of dung beetles, wherein each dung beetle represents a combination of gamma and sigma, and initial values of the gamma and the sigma are randomly generated in a determined parameter range; s423, carrying out position update on the initially generated dung beetle population according to formulas of rolling ball behaviors, dancing behaviors, propagation behaviors, foraging behaviors and stealing behaviors; s424, fitting sample data of individual dung beetles by adopting an LSSVR model, and calculating an average square error as the fitness of the individual dung beetles; s425, updating the identity and the position of the dung beetle agent: updating each individual position of the dung beetles by adopting an updating rule of the DBO agent identity according to the current individual position information of the dung beetles; s426, repeating the steps S423 to S425 until the preset iteration times are reached or the preset precision requirement is met; s427, finally selecting particles with the minimum fitness value as an optimal solution, wherein the corresponding gamma and sigma are gamma and sigma parameters in a final LSSVR prediction model kernel function.
2. The method for constructing the COD soft measurement model based on the ultraviolet-visible light absorption spectrum according to claim 1, which is characterized by comprising the following steps: the step S1 includes:
Collecting multiple full-band spectrum data of the same sample at the same moment through a spectrum probe, calculating the pearson coefficient of the multiple full-band spectrum data, and when the pearson coefficient meets the correlation requirement, considering that the spectrum probe passes the absorbance repeatability test;
collecting multiple full-band spectrum data of the same sample in a period of time through a spectrum probe, calculating the pearson coefficient of the multiple full-band spectrum data, and when the pearson coefficient meets the correlation requirement, considering that the spectrum probe passes the absorbance stability test;
and taking the spectrum probe which passes the absorbance repeatability test and the absorbance stability test simultaneously as a spectrum probe which is qualified in test.
3. The method for constructing the COD soft measurement model based on the ultraviolet-visible light absorption spectrum according to claim 2, which is characterized by comprising the following steps: the pearson coefficient r is expressed as:
Wherein x i and y i represent two different sets of full band spectral data, And (3) withThe average values of x i and y i are respectively expressed, r represents the pearson coefficient, and when the r value is larger than 0.99, the pearson coefficients of the two groups of full-band spectrum data are considered to meet the correlation requirement.
4. The method for constructing the COD soft measurement model based on the ultraviolet-visible light absorption spectrum according to claim 1, which is characterized by comprising the following steps: the step S2 includes:
And collecting spectrum data of water samples of different components through a qualified spectrum probe, collecting multiple spectrum data of the water samples of the same component, taking an average value as spectrum data of the water samples, wherein the spectrum data is absorption spectrum data of the water samples in a full wave band, denoising the spectrum data, combining the spectrum data with concentration values corresponding to the water samples to form water sample samples, and constructing a database through multiple groups of water sample samples.
5. The method for constructing the COD soft measurement model based on the ultraviolet-visible light absorption spectrum according to claim 1, which is characterized by comprising the following steps: the step S3 includes:
S31, randomly selecting a certain amount of spectrum data from a database each time as a correction set to establish a PLS model, and taking the percentage of the absolute value of a regression coefficient in the PLS model of each sampling process as an importance index of a variable, wherein the variable refers to the data of absorbance of a certain characteristic wavelength in the spectrum data;
s32, removing variables with relatively smaller absolute values of regression coefficients by using an exponential decay function to obtain a reserved variable proportion;
s33, determining the number of sampling variables according to the ratio of the reserved variables, carrying out resampling PLS modeling, and calculating the cross verification root mean square error;
And S34, performing loop calculation according to the set loop iteration times to obtain a plurality of variable combinations and a plurality of cross-validation root mean square errors, and selecting the variable combination with the minimum cross-validation root mean square error as a training sample set of the prediction model.
6. The method for constructing the COD soft measurement model based on the ultraviolet-visible light absorption spectrum according to claim 5, which is characterized in that: when the PLS model is built in the ith sampling, the retention variable ratio r i is:
ri=ae-ki
a=(P/2)1/(N-1)
k=ln(P/2)/(N-1)
Wherein r i is the ratio of reserved variables in the ith sampling, i epsilon (0, N) is the number of loop iterations, N is the maximum number of loop iterations, a and k are constants, and P is all variables.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310628547.5A CN117074333B (en) | 2023-05-30 | COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310628547.5A CN117074333B (en) | 2023-05-30 | COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117074333A CN117074333A (en) | 2023-11-17 |
CN117074333B true CN117074333B (en) | 2024-11-15 |
Family
ID=
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109001136A (en) * | 2018-09-20 | 2018-12-14 | 杭州绿洁水务科技股份有限公司 | A kind of COD on-line monitoring method based on ultraviolet visible light absorption spectrum |
CN109709057A (en) * | 2018-12-29 | 2019-05-03 | 四川碧朗科技有限公司 | Water quality indicator prediction model construction method and water quality indicator monitoring method |
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109001136A (en) * | 2018-09-20 | 2018-12-14 | 杭州绿洁水务科技股份有限公司 | A kind of COD on-line monitoring method based on ultraviolet visible light absorption spectrum |
CN109709057A (en) * | 2018-12-29 | 2019-05-03 | 四川碧朗科技有限公司 | Water quality indicator prediction model construction method and water quality indicator monitoring method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yun et al. | A hybrid variable selection strategy based on continuous shrinkage of variable space in multivariate calibration | |
Xu et al. | Hyperspectral imaging for high-resolution mapping of soil carbon fractions in intact paddy soil profiles with multivariate techniques and variable selection | |
Yu et al. | Prediction of soil properties based on characteristic wavelengths with optimal spectral resolution by using Vis-NIR spectroscopy | |
CN103712939B (en) | A kind of pollutant levels approximating method based on uv-vis spectra | |
CN110726694A (en) | Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm | |
Li et al. | Quantitative analysis of aflatoxin B1 of peanut by optimized support vector machine models based on near-infrared spectral features | |
Wang et al. | Estimation of soil organic matter by in situ Vis-NIR spectroscopy using an automatically optimized hybrid model of convolutional neural network and long short-term memory network | |
Ye et al. | Water chemical oxygen demand prediction model based on the CNN and ultraviolet-visible spectroscopy | |
Xu et al. | Optimizing machine learning models for predicting soil pH and total P in intact soil profiles with visible and near-infrared reflectance (VNIR) spectroscopy | |
CN118275371B (en) | Full spectrum water index detection method based on neural network model | |
CN112630180B (en) | Ultraviolet/visible light absorption spectrum model for detecting concentration of organophosphorus pesticide in water body | |
Yao et al. | Prediction of total nitrogen in soil based on random frog leaping wavelet neural network | |
Wang et al. | XGBoost algorithm assisted multi-component quantitative analysis with Raman spectroscopy | |
CN117074333B (en) | COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum | |
CN106596506A (en) | AirPLS realization method based on compression storage and column pivoting Gauss elimination technologies | |
CN117935930A (en) | Method and system for detecting total selenium and organic selenium content of selenium-enriched saussurea involucrata | |
CN116818687B (en) | Soil organic carbon spectrum prediction method and device based on spectrum guide integrated learning | |
CN117074333A (en) | COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum | |
CN114062306B (en) | Near infrared spectrum data segmentation preprocessing method | |
CN116399836A (en) | Cross-talk fluorescence spectrum decomposition method based on alternating gradient descent algorithm | |
CN115728290A (en) | Method, system, equipment and storage medium for detecting chromium element in soil | |
Wang et al. | Missing data recovery combined with Parallel factor analysis model for eliminating Rayleigh scattering in the process of detecting pesticide mixture | |
Yu et al. | A weighted ensemble method based on wavelength selection for near-infrared spectroscopic calibration | |
Thomas | Incorporating auxiliary predictor variation in principal component regression models | |
Saberioon et al. | Enhancing soil organic carbon prediction of LUCAS soil database using deep learning and deep feature selection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |