CN110726694A - Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm - Google Patents
Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm Download PDFInfo
- Publication number
- CN110726694A CN110726694A CN201911006149.XA CN201911006149A CN110726694A CN 110726694 A CN110726694 A CN 110726694A CN 201911006149 A CN201911006149 A CN 201911006149A CN 110726694 A CN110726694 A CN 110726694A
- Authority
- CN
- China
- Prior art keywords
- wavelength
- spectrum
- interval
- vector
- variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 73
- 230000002068 genetic effect Effects 0.000 title claims abstract description 62
- 238000010187 selection method Methods 0.000 title claims abstract description 16
- 230000003595 spectral effect Effects 0.000 title claims description 53
- 238000001228 spectrum Methods 0.000 claims abstract description 166
- 239000013598 vector Substances 0.000 claims abstract description 111
- 238000010238 partial least squares regression Methods 0.000 claims abstract description 31
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims description 51
- 238000000034 method Methods 0.000 claims description 49
- 238000002329 infrared spectrum Methods 0.000 claims description 33
- 210000000349 chromosome Anatomy 0.000 claims description 20
- 238000012360 testing method Methods 0.000 claims description 13
- 230000035772 mutation Effects 0.000 claims description 12
- 238000007430 reference method Methods 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 4
- 238000010200 validation analysis Methods 0.000 claims description 3
- 239000002689 soil Substances 0.000 description 37
- 238000012937 correction Methods 0.000 description 23
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 22
- 229910052698 phosphorus Inorganic materials 0.000 description 22
- 239000011574 phosphorus Substances 0.000 description 22
- 230000006870 function Effects 0.000 description 17
- 239000000126 substance Substances 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 235000015097 nutrients Nutrition 0.000 description 7
- 238000005457 optimization Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000010521 absorption reaction Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- RFSUNEUAIZKAJO-ARQDHWQXSA-N Fructose Chemical compound OC[C@H]1O[C@](O)(CO)[C@@H](O)[C@@H]1O RFSUNEUAIZKAJO-ARQDHWQXSA-N 0.000 description 2
- 229930091371 Fructose Natural products 0.000 description 2
- 108010070551 Meat Proteins Proteins 0.000 description 2
- 238000000862 absorption spectrum Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 229960002737 fructose Drugs 0.000 description 2
- 239000002366 mineral element Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 235000019624 protein content Nutrition 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 238000010183 spectrum analysis Methods 0.000 description 2
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- 238000004497 NIR spectroscopy Methods 0.000 description 1
- 238000002835 absorbance Methods 0.000 description 1
- 238000000559 atomic spectroscopy Methods 0.000 description 1
- 125000003636 chemical group Chemical group 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010905 molecular spectroscopy Methods 0.000 description 1
- 235000021049 nutrient content Nutrition 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000000985 reflectance spectrum Methods 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/359—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Genetics & Genomics (AREA)
- Physiology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention discloses a characteristic wavelength selection method and a system of a spectrum variable gradient integrated genetic algorithm, which divide a full spectrum into a plurality of wavelength intervals; extracting important wavelength intervals from all the wavelength intervals according to the important projection coefficients and combining the important wavelength intervals into an interval spectrum; taking a random combination characteristic wavelength vector of an interval spectrum as an initial population of a genetic algorithm, taking the reciprocal of the root mean square error of a partial least squares regression model as a fitness function of the characteristic wavelength vector, and selecting the characteristic wavelength vector with the maximum fitness value as an optimal characteristic wavelength vector by utilizing the genetic algorithm; selecting, crossing and mutating the initial population, and replacing the original population with the obtained new individual to form a new population; and (5) iterating to an evolutionary algebra, and outputting a final optimal characteristic wavelength vector. The invention solves the problem of selecting the co-linearity and the redundant wavelength variable in the prior art, simplifies the calculation, improves the prediction precision and ensures that the regression model has better interpretation capability and generalization capability.
Description
Technical Field
The invention relates to the technical field of spectral analysis, in particular to a characteristic wavelength selection method and a characteristic wavelength selection system of a spectral variable gradient integrated genetic algorithm.
Background
Spectroscopic analysis is a method of identifying substances and determining their chemical composition and relative content based on their spectra, and is an analytical method established based on molecular and atomic spectroscopy. Since each atom has its own characteristic spectrum, it is possible to identify a substance and determine its chemical composition from the spectrum. The substance can be qualitatively analyzed by using characteristic spectra of different spectral analysis methods, and quantitatively analyzed according to spectral intensity. The characteristic wavelength selected when establishing the spectrum detection model has a great influence on the accuracy of the model. The existing characteristic wavelength variable selection method based on the group intelligent optimization algorithm has the problems that the probability of selecting weak-correlation wavelength variables is high, and the local optimal solution is easy to fall into.
When a soil sample is irradiated by the visible near infrared spectrum, various chemical groups containing hydrogen elements (such as C-H, O-H, S-H, N-H and the like) in soil nutrient substances are excited to generate frequency doubling and frequency combination absorption information of molecular vibration, and the content of the soil nutrient can be accurately measured by measuring the absorption degree of the visible near infrared spectrum of the soil nutrient by using a visible near infrared spectrum analysis technology. However, each soil nutrient has its own absorption spectrum wavelength, and the absorption signal is weak, the bands are overlapped, and interference information such as environmental noise and irrelevant information is also included, so that the near infrared absorption spectrum of the soil sample is extremely complex. In addition, the spectral data of the same sample has a collinear relationship, so that data redundancy is easy to generate. If a regression model is established by using full spectrum data, the problems of visible near infrared spectrum height overlapping and collinearity between adjacent characteristic variables are difficult to eliminate, the prediction precision is not high, and the model is complex and weak in generalization capability. The characteristic wavelength selection method based on the group intelligent optimization algorithm takes the root mean square error of the PLSR as an objective function, and randomly searches the characteristic wavelength vector with the minimum root mean square error. However, the characteristic wavelength variable is selected in the visible near-infrared full spectrum range, and the probability of selecting the wavelength variable with weak correlation is high, so that the wavelength variable easily falls into a local optimal solution. Therefore, the optimal selection of the interval spectrum with the maximum correlation with the target variable of the soil nutrient from the wavelength variables of the visible near-infrared full spectrum and the selection of the characteristic wavelength variable on the interval spectrum become the key technology for improving the prediction precision of the soil nutrient.
Disclosure of Invention
The invention aims to solve the problems that the probability of selecting weak-correlation wavelengths is high and the near infrared spectrum analysis precision is to be improved due to the fact that a characteristic wavelength selection method adopted by the existing near infrared spectrum analysis method is easy to fall into a local optimal solution, and provides a characteristic wavelength selection method and a characteristic wavelength selection system of a spectrum variable gradient integrated genetic algorithm.
In one aspect, the invention provides a characteristic wavelength selection method of a spectral variable gradient integrated genetic algorithm, which comprises the following steps:
scanning a plurality of soil samples by using visible near infrared spectrum scanning equipment to generate a visible near infrared spectrum data matrix, establishing a partial least squares regression model for full spectrum wavelength variables contained in the spectrum data matrix, and determining importance projection coefficients of the full spectrum wavelength variables;
dividing the full spectrum of the spectrum data matrix into a plurality of wavelength intervals, and extracting the wavelength intervals with the important projection coefficients of the wavelength variables larger than a preset value from all the wavelength intervals to obtain important wavelength intervals;
merging the important wavelength intervals of the spectrum data matrix into an interval spectrum, taking the random combination characteristic wavelength vector of the interval spectrum as an initial population of the genetic algorithm, and solving the root mean square error of the partial least square regression model;
taking the reciprocal of the root mean square error of the partial least squares regression model as a fitness function of the characteristic wavelength vector, and selecting the characteristic wavelength vector with the maximum fitness value as an optimal characteristic wavelength vector; selecting, crossing and mutating the initial population, and replacing the original population with the obtained new individual to form a new population; and (5) iterating to an evolutionary algebra, and outputting a final optimal characteristic wavelength vector.
Further, after the important wavelength intervals are obtained, one wavelength variable in each important wavelength interval is removed to only the last wavelength variable is left by using a backward interval partial least square regression algorithm, a combined wavelength vector corresponding to the minimum root mean square error of the partial least square regression model in each important wavelength interval is searched, each new important wavelength interval is constructed and combined to form an interval spectrum, the random combined characteristic wavelength vector of the interval spectrum is used as an initial population of the genetic algorithm, and the root mean square error of the partial least square regression model is solved.
Further, the method for dividing the full spectrum into a plurality of wavelength intervals is as follows:
calculating a purity row vector of the full-spectrum wavelength variable and a linear purity gradient vector of the full-spectrum wavelength variable in the horizontal direction; the full spectrum is divided into a plurality of wavelength intervals by using the positive and negative changes of the gradient value in the linear purity gradient vector of the full spectrum wavelength variable.
Further, the fitness function F expression of the characteristic wavelength vector is as follows:
F=1/RMSE,
wherein RMSE establishes the root mean square error of a partial least squares regression model for full spectrum data matrix column data, yiThe reference method test value for the ith sample,predicted value of partial least squares regression model for each characteristic wavelength variable of ith sample, npIs the number of samples.
Further, a new population is formed according to the selected population size, the cross probability, the mutation probability and the selection probability, wherein the mutation operator adopts a real number coding differential mutation operator, and a calculation formula is as follows:
Z(i,j)=D×(E(r1,j)-E(r2,j))+E(i,j),
wherein Z (i, j) represents a real number-encoded offspring value of the j-th chromosome of the ith individual, D represents a mutation factor, E (r1, j) represents a real number-encoded parent value of the j-th chromosome of the r1 randomly generated in the population, E (r2, j) represents a real number-encoded parent value of the j-th chromosome of the r2 randomly generated in the population, E (r1, j) -E (r2, j) represents a difference value between the real number-encoded parent value of the j-th chromosome of the r 1-th individual and the real number-encoded parent value of the j-th chromosome of the r 2-th individual, and E (i, j) represents an encoded parent value of the j-th chromosome of the ith individual.
Further, the method of extracting important wavelength intervals having an importance projection coefficient of a wavelength variable larger than a predetermined value from all wavelength intervals and combining them into one interval spectrum is as follows:
the wavelength column number in each important wavelength interval is converted into a wavelength index number row vector of an interval spectrum; and the column number range of the interval spectral wavelength index number row vector is the value range of the characteristic wavelength vector elements, and each column of data of the spectral data matrix is obtained through a mapping table of the column number and the specific interval spectral wavelength index number row vector.
In another aspect, the present invention provides a system for selecting a characteristic wavelength of a spectral variable gradient integrated genetic algorithm, comprising:
the partial least square regression model establishing module is used for scanning a plurality of samples by utilizing visible near infrared spectrum scanning equipment to generate a visible near infrared spectrum data matrix, establishing a partial least square regression model for a full spectrum wavelength variable contained in the spectrum data matrix and determining an importance projection coefficient of the full spectrum wavelength variable;
the wavelength interval division module is used for dividing the full spectrum of the spectrum data matrix into a plurality of wavelength intervals;
the important wavelength interval determining module is used for extracting the wavelength interval containing the wavelength variable and the wavelength interval of which the important projection coefficient is greater than the preset value from all the wavelength intervals to obtain an important wavelength interval;
the genetic algorithm selection module is used for combining the important wavelength intervals of the spectrum data matrix into an interval spectrum, taking the random combination characteristic wavelength vector of the interval spectrum as an initial population of the genetic algorithm, and solving the root mean square error of the partial least square regression model;
taking the reciprocal of the root mean square error of the partial least squares regression model as a fitness function of the characteristic wavelength vector, and selecting the characteristic wavelength vector with the maximum fitness value as an optimal characteristic wavelength vector; selecting, crossing and mutating the initial population, and replacing the original population with the obtained new individual to form a new population; and (5) iterating to an evolutionary algebra, and outputting a final optimal characteristic wavelength vector.
Further, the important wavelength interval determining module further includes removing one wavelength variable in each important wavelength interval to only the last wavelength variable by using a backward interval partial least square regression algorithm after obtaining the important wavelength interval, finding a combined wavelength vector corresponding to the minimum root mean square error of the partial least square regression model in each important wavelength interval, and constructing each new important wavelength interval.
The beneficial technical effects of the invention are as follows: the full spectrum of the spectrum data matrix is divided into a plurality of wavelength intervals, the wavelength intervals with the important projection coefficients of wavelength variables larger than a preset value are extracted from all the wavelength intervals, the important wavelength intervals are obtained and combined into an interval spectrum, the random combination characteristic wavelength vector of the interval spectrum is taken as an initial population of the genetic algorithm, the probability of selecting the potential optimal characteristic wavelength variable in the interval spectrum by the genetic algorithm is greatly increased, the problems that the potential optimal characteristic wavelength variable is selected in the visible near-infrared full spectrum wavelength variable by the group intelligent optimization algorithm, colinearity and redundant wavelength variable are selected are solved, the calculation amount of a regression model is simplified, the prediction accuracy is improved, and the regression model has better interpretation capability and generalization capability;
the method comprises the steps of dividing a visible near-infrared full spectrum into a plurality of wavelength intervals by utilizing the positive and negative change times of the variable linear purity gradient value of the visible near-infrared full spectrum, extracting important wavelength intervals with strong interpretability on a predicted target quantity from the visible near-infrared full spectrum by adopting a wavelength variable projection importance coefficient (VIP) output by a partial least squares regression model (PLSR) larger than a preset value as a wavelength interval extraction criterion, combining the important wavelength intervals into an interval spectrum, taking a random combination characteristic wavelength vector of the interval spectrum as an initial population of a genetic algorithm, improving the probability that the genetic algorithm selects a wavelength variable with strong correlation from the interval spectrum, reducing the probability that the wavelength variable with weak correlation is selected from the visible near-infrared full spectrum, being beneficial to eliminating collinearity relations and redundant data, and improving the prediction precision of the regression model;
the combined wavelength vector corresponding to the minimum root mean square error of the partial least square regression model in each important wavelength interval is searched by respectively using a backward interval partial least square regression algorithm after the obtained important wavelength intervals are separated, each new important wavelength interval is constructed and combined into an interval spectrum, the random combined characteristic wavelength vector of the interval spectrum is used as an initial population of the genetic algorithm, the probability of selecting wavelength variables with high correlation in the interval spectrum by the genetic algorithm is further improved, the collinearity relation and redundant data are effectively eliminated, and the prediction precision of the obtained regression model is better;
the invention provides a real number coding differential mutation operator, which utilizes an improved genetic algorithm to enlarge a global optimal solution searching space, enables the improved genetic algorithm to search a global optimal solution and has high convergence speed;
the invention further sets the column number range of the wavelength index number row vector of the interval spectrum as the value range of the characteristic wavelength vector elements, acquires each column data of the spectrum data matrix through the mapping table of the column number and the specific interval spectrum wavelength index number row vector, and establishes a partial least square regression model, so that the characteristic wavelength vector population generation mode and the spectrum matrix data acquisition method are simple and easy to implement.
Drawings
FIG. 1 is a flow chart of a method for selecting a characteristic variable of a spectral variable gradient integrated genetic algorithm according to an embodiment of the present invention;
FIG. 2 is a flow chart of a feature variable selection method of a spectral variable gradient integrated genetic algorithm according to another embodiment of the present invention;
FIG. 3 is a graph of the visible near infrared full spectrum variable purity of a soil fast-acting phosphorus calibration set according to an embodiment of the present invention;
FIG. 4 is a graph of a visible near infrared full spectrum variable purity gradient of a soil fast-acting phosphorus calibration set according to an embodiment of the present invention;
FIG. 5 illustrates a VIP curve of a near infrared full spectrum PLSR for a soil fast-acting phosphorus calibration set according to an embodiment of the present invention;
FIG. 6 is a modified genetic algorithm fitness function F iterative optimization curve according to an embodiment of the present invention;
FIG. 7 shows 25 optimal characteristic wavelength profiles selected by the improved genetic algorithm according to the embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The characteristic variable selection method of the spectral variable gradient integrated genetic algorithm provided by the invention comprises the following steps (as shown in figure 1):
scanning a plurality of samples by using visible near infrared spectrum scanning equipment to generate a visible near infrared spectrum data matrix, establishing a partial least square regression model for full spectrum wavelength variables contained in the spectrum data matrix, and determining an importance projection coefficient of the full spectrum wavelength variables;
dividing the full spectrum of the spectrum data matrix into a plurality of wavelength intervals, and extracting the wavelength intervals with the important projection coefficients of the wavelength variables larger than a preset value from all the wavelength intervals to obtain important wavelength intervals;
merging the important wavelength intervals of the spectrum data matrix into a random combination characteristic wavelength vector of an interval spectrum as an initial population of a genetic algorithm, and solving a root mean square error of a partial least square regression model;
taking the reciprocal of the root mean square error of the partial least squares regression model as a fitness function of the characteristic wavelength vector, and selecting the characteristic wavelength vector with the maximum fitness value as an optimal characteristic wavelength vector; selecting, crossing and mutating the initial population, and replacing the original population with the obtained new individual to form a new population; and (5) iterating to an evolutionary algebra, and outputting a final optimal characteristic wavelength vector.
Establishing a regression model by using the optimal characteristic wavelength vector to perform quantitative analysis, such as determining element content predicted values related to the characteristic wavelength; the characteristic variable selection method and the system based on the spectral variable gradient integrated genetic algorithm can be applied to element quantitative analysis in the fields of fruit sugar content, meat protein content, mineral element content, soil nutrient content and the like.
The method can effectively reduce the probability of selecting the weakly-correlated wavelength variable in the visible near-infrared full spectrum, is beneficial to eliminating the co-linear relation and redundant data, and improves the prediction precision of the material content.
The variable importance projection coefficient (VIP) is one of the important output parameters of the PLSR model, which reflects the PLSR model's score for each independent variable. It is generally considered that when VIP of a certain wavelength variable for establishing a PLSR model by using a correction set visible near infrared full spectrum wavelength variable is greater than 1, it indicates that the spectral wavelength variable has an important role in predicting a target variable.
The method can further reduce the important wavelength interval after obtaining the important wavelength interval to obtain a new important wavelength interval, further improve the probability of selecting the wavelength variable with high correlation, more effectively eliminate the co-linear relation and redundant data, and ensure that the prediction precision of the established regression model is better.
According to the above method, the first embodiment of the present invention: the provided characteristic variable selection method for the spectrum variable gradient integrated genetic algorithm is a flow chart, and the embodiment aims at soil element content analysis, can be applied to software design of a soil near-infrared spectrum analyzer, and comprises the following steps:
step 2: the full spectrum is divided into s wavelength intervals. In a specific embodiment, the division method may be implemented by using the prior art, for example, the most classical method is to equally divide a full spectrum into N wavelength intervals, which is not described herein in detail for the prior art; the preferred wavelength interval division method provided in the following embodiments may also be employed in other embodiments;
and 3, establishing a partial least squares regression model (PLSR) by using the full spectrum wavelength Variable of the correction set, and outputting a Variable incidence in projection coefficient (VIP) of the importance of the full spectrum wavelength Variable.
And step 4, taking the VIP of each wavelength variable larger than the predetermined value (the embodiment is set to 1, and other embodiments may set other values as required, such as 0.5 or 0.8, etc.) as the important wavelength interval extraction criterion, and extracting k incompletely continuous important wavelength intervals (k < s) only including the wavelength variable with the VIP larger than the predetermined value from the s wavelength intervals.
step 6, setting the population scale of the genetic algorithm, the number of characteristic wavelengths, and a fitness function taking the reciprocal of the root mean square error of the partial least squares regression model as a characteristic wavelength vector, and selecting the characteristic wavelength vector with the maximum fitness value as an optimal characteristic wavelength vector; selecting, crossing and mutating the initial population, and replacing the original population with the obtained new individual to form a new population; iterating to an evolutionary algebra, and outputting a final optimal characteristic wavelength vector; and establishing a regression model by using the optimal characteristic wavelength vector, and solving the predicted value of the content of a certain substance in a certain sample.
The optimal characteristic wavelength vector obtained by adopting the spectral variable gradient integrated genetic algorithm characteristic variable selection method provided by the invention is used for establishing the regression model, so that the structure of the regression model can be simplified, the precision of near infrared spectrum analysis is improved, the regression model has better generalization capability and robustness, the method can be applied to software design of a near infrared spectrometer, has the characteristics of simple realization, simultaneous determination of multiple components, high analysis speed, low cost, no damage to samples, no consumption of chemical reagents, no environmental pollution and the like, and has good popularization and application prospects in the aspect of detection of contents of substances such as soil nutrients, fruit sugar contents, meat protein contents, mineral element contents and the like.
The regression model is built by utilizing a plurality of groups of observed data (x) of a sample seti,yi) To estimate the regression coefficients in the regression equation. The method for establishing the regression model is not limited to the partial least squares regression model, and can be realized by the prior art, preferably by a nonlinear regression model.
For partial least squares regression models, the predicted value of the substance content of a certain sampleWith multiple wavelength variables, i.e. multiple linear relationshipsε is the random error, β0Is a regression constant, beta1~βnIs n regression coefficients, x1~xnThe characteristic wavelength variable is near infrared spectrum diffuse reflection absorbance data obtained by scanning n characteristic wavelengths; beta is a1~βnThe estimation is to find a predicted value by establishing a partial least squares regression modelRegression coefficients corresponding to the minimum root mean square error between the sample reference method test values yi: beta is a1~βn。
The second embodiment: on the basis of the above embodiment, in order to further improve the probability of selecting a wavelength variable with high correlation, and more effectively remove the co-linear relationship and redundant data, so that the prediction accuracy of the obtained regression model is better, the method further includes: after obtaining the important wavelength intervals, removing one wavelength variable in each selected wavelength interval to only leave the last wavelength variable by using a backward interval partial least squares regression algorithm, finding a combined wavelength vector corresponding to the minimum root mean square error of the partial least squares regression model in each important wavelength interval, constructing each new important wavelength interval, sequentially combining the new important wavelength intervals into an interval spectrum (in the embodiment, sequentially combining the important wavelength intervals of the spectrum data matrix into an interval spectrum, in other embodiments, other combination modes can be adopted to obtain the interval spectrum), taking the random combination characteristic wavelength vector of the interval spectrum as an initial population of the genetic algorithm, and solving the root mean square error of the partial least squares regression model.
The third embodiment is as follows: on the basis of the above embodiment, in order to more accurately reflect the contribution size of the wavelength variable to the prediction target variable, the method further includes dividing the full spectrum into s wavelength intervals by the following method:
calculating a purity row vector of the full-spectrum wavelength variable and a linear purity gradient vector of the full-spectrum wavelength variable in the horizontal direction; and dividing the full spectrum into a plurality of wavelength intervals by using the positive and negative changes of the gradient value in the linear purity gradient vector of the wavelength variable of the correction set full spectrum. The wavelength interval division method adopted by the embodiment is more scientific than the traditional division method for artificially dividing the full spectrum into N equally-spaced wavelength intervals, because the positive and negative changes of the purity gradient of the wavelength variable mean the change trend of useful information in the spectrum data, the wavelength interval is divided by the positive and negative change points of the linear purity gradient of the wavelength variable, and the wavelength interval with strong interpretability on the target variable can be more scientifically divided.
Particular embodiments employ a concentration gradient method to divide a spectral data matrix sample set into a correction set and a validation set. The correction set full-spectrum wavelength variable purity row vector is a row vector formed by taking each wavelength variable purity value as an element, one wavelength variable purity value is equal to the standard deviation of the spectral data column vector generated by scanning all samples by each visible near-infrared spectrum wavelength and is divided by the average value of the spectral data column vector, and the calculation formula is as follows: p is a radical ofi=σi/μi(i ═ 1 to n), where piPurity, σ, defined as the ith spectral wavelength variableiIs the standard deviation, μ, of all data samples at the ith spectral wavelengthiIs the average value of all data samples at the ith spectral wavelength, and n is the purity row vector order of the full spectral wavelength variable.
The magnitude of the wavelength variable purity value reflects the magnitude of the contribution of the wavelength variable to the predicted target variable.
The purity gradient row vector of the full-spectrum wavelength variable of the correction set is a row vector formed by taking two adjacent purity values in the purity row vector of the full-spectrum wavelength variable from left to right in the horizontal direction as elements, and the calculation formula of each purity gradient value is as follows:
the 1 st element being g1=p1-p2The element in the ith column is gi=(pi+1-pi-1) I is more than or equal to 2 and less than or equal to n-1, and the nth element is gn=pn-pn-1. Wherein p is1,p2,pi-1,pi+1,pn-1,pnIs a full spectrum variable purity row vector P ═ P1,p2,...,pn]Medium purity element, g1,gi,gnRespectively represent the purity gradient elements of the 1 st, ith and nth columns in the wavelength variable purity gradient vector, and n is the order of the full spectrum wavelength variable purity gradient row vector. The wavelength variable purity gradient value reflects the change rate of the spectral variable purity value.
The larger the wavelength variable purity gradient value is, the larger the contribution of the wavelength variable to the prediction target variable is, and the higher the possibility of finding the potential characteristic variable is. If a spectral variant purity gradient is positive, it indicates that the spectral variant purity change is positive at this wavelength point, and vice versa. If the purity gradient value of a certain spectral variable is zero, the wavelength variable does not contribute much to the prediction target variable.
In a fourth embodiment, a method flow of the embodiment is shown in fig. 2. On the basis of the above embodiment, in order to solve the problems of long time for modeling near infrared spectroscopy analysis, weak generalization capability of a model, low prediction accuracy and the like caused by the fact that spectral variables are many, spectral information is easy to overlap, data redundancy and a large amount of noise exist, an improved genetic algorithm is adopted to combine the important wavelength interval sequences provided by the invention into an interval spectrum to select characteristic wavelengths (in the embodiment, the important wavelength interval sequences of the spectral data matrix are combined into an interval spectrum, and in other embodiments, other combination modes can be adopted to obtain the interval spectrum), and a real number coding mode is adopted for chromosomes. The number of the optimal characteristic wavelengths is set to be constant between 15 and 100, and the evolution algebra is set to be constant between 100 and 200.
The improved genetic algorithm adopts an improved real number coding differential mutation operator, and the calculation formula is as follows:
Z(i,j)=D×(E(r1,j)-E(r2,j))+E(i,j),
wherein Z (i, j) represents a real number-encoded offspring value of the j-th chromosome of the i-th individual, D represents a mutation factor, E (r1, j) represents a real number-encoded parent value of the j-th chromosome of the r 1-th individual randomly generated in the population, E (r2, j) represents a real number-encoded parent value of the j-th chromosome of the r 2-th individual randomly generated in the population, and E (r1, j) -E (r2, j) represents a difference value between the real number-encoded parent value of the j-th chromosome of the r 1-th individual and the real number-encoded parent value of the j-th chromosome of the r 2-th individual. E (i, j) represents the real number-encoding parent value of the j-th chromosome of the i-th individual.
The improved difference mutation operator enlarges the global optimal solution searching space, enables the improved genetic algorithm to search the global optimal solution, and has high convergence speed.
According to the embodiment, wavelength intervals are divided by the positive and negative change times of the purity gradient value of the wavelength variable of the correction set visible near infrared full spectrum, important wavelength intervals are extracted by using a variable projection importance coefficient output by a partial least squares regression model (PLSR) to be larger than 1, new important wavelength intervals are screened in the important wavelength intervals by a backward interval PLS regression algorithm (BiPLSR), and all the new important wavelength intervals are combined into an interval spectrum. And (3) applying an improved genetic algorithm to select the characteristic wavelength vector corresponding to the minimum Root Mean Square Error (RMSE) of the PLSR model in the interval spectrum as an optimal characteristic wavelength variable. According to the embodiment, the probability of selecting the wavelength variable with strong correlation in the interval spectrum by the improved genetic algorithm is improved, the probability of selecting the wavelength variable with weak correlation in the visible near-infrared full spectrum is reduced, the elimination of the co-linear relation and redundant data is facilitated, and the prediction accuracy of the regression model is improved.
The fifth embodiment: in order to further enable the characteristic wavelength vector population generation mode and the spectrum matrix data acquisition method to be simple and easy, on the basis of the embodiment, important wavelength intervals with the important projection coefficients of wavelength variables larger than a preset value are extracted from all wavelength intervals and are sequentially combined into an interval spectrum; in this embodiment, the important wavelength intervals of the spectrum data matrix are sequentially combined into one interval spectrum, and in other embodiments, other combination methods may be adopted to obtain the interval spectrum.
The specific method for sequentially combining the important wavelength intervals into an interval spectrum is as follows:
converting the wavelength column numbers in all the important wavelength intervals into wavelength index number row vectors of interval spectrums; the column number range of the wavelength index number row vector of the interval spectrum is a value range of characteristic wavelength vector elements, each column data of the spectrum data matrix is obtained through a mapping table of the column number and the interval spectrum wavelength index number row vector, and a partial least square regression model is established, so that a characteristic wavelength vector population generation mode and a spectrum matrix data obtaining method are simple and easy to implement.
Embodiment six: on the basis of the above embodiment, each column of data of the correction set data matrix is obtained, and the reciprocal (1/RMSE) of the Root Mean Square Error (RMSE) of the partial least squares regression model is established as a fitness function. The fitness function F is calculated as follows:
In the specific embodiment, the relevant parameters may also be adjusted according to the actual application, for example, by changing the chromosome and length of the genetic algorithm, or further processing the root mean square error of the partial least squares regression model of each wavelength variable of the ith sample, and accordingly adjusting the fitness function.
The following is experimental data for the specific embodiment shown in fig. 2:
the wavelength range of the visible near infrared spectrum is 350-2500 nm. 193 parts of soil samples are scanned by a spectrometer by using a visible near infrared spectrum with the resolution set to be 1nm and the wavelength range set to be 350nm to 1655nm (1306 wavelengths), and a 193 multiplied by 1306 soil quick-acting phosphorus diffuse reflectance spectrum data matrix sample set is generated. After the diffuse reflectance spectral data sample set is preprocessed, a concentration gradient method sample division method is adopted, 193 parts of spectral data matrix sample set is divided into 157 parts of correction set samples and 36 parts of verification set samples according to the proportion of 3:1, the quick-acting phosphorus content reference method test value statistical data of 193 parts of soil samples are shown in table 1, the table 1 is a quick-acting phosphorus content reference method test value statistical data table of 193 parts of soil samples, as can be seen from table 1, the correction set and the verification set of the soil quick-acting phosphorus content reference method test value data samples are divided into similar standard deviation distribution characteristics, but the dispersion is large. The reference method test value refers to a test value for the content of a substance by a chemical method or other methods.
TABLE 1193 quick-acting phosphorus content reference method test value statistical data table of soil sample
Then, the visible near-infrared full-spectrum wavelength variable purity row vector of the soil quick-acting phosphorus correction set is calculated, and a visible near-infrared full-spectrum wavelength variable purity curve of the soil quick-acting phosphorus correction set is shown in fig. 3.
And then calculating the purity gradient row vector of the visible near infrared all-spectral wavelength variable purity row vector of the soil quick-acting phosphorus correction set in the horizontal direction, wherein a visible near infrared all-spectral wavelength variable purity gradient curve of the soil quick-acting phosphorus correction set is shown in figure 4. As can be seen from fig. 4, the peak wavelength range of the near-infrared full spectrum wavelength variable purity gradient curve of the soil fast-acting phosphorus calibration set can be divided into 3: the wavelength range of the maximum peak is 800-1200 nm, the wavelength range of the medium peak is 1200-1655 nm, and the wavelength range of the small peak is 350-800 nm.
And dividing a full spectrum interval into a plurality of unequally spaced wavelength intervals by using the positive and negative change times of the purity gradient element values in the visible near-infrared full spectrum variable purity gradient vector of the soil quick-acting phosphorus correction set.
And (3) establishing a PLSR model by using the visible near-infrared full spectrum wavelength variable of the soil fast-acting phosphorus correction set, and outputting a full spectrum wavelength variable importance projection coefficient (VIP), wherein a VIP curve of the visible near-infrared full spectrum PLSR of the soil fast-acting phosphorus correction set is shown in a graph 5.
Taking VIP greater than 1 of each wavelength variable as an important wavelength interval extraction criterion, extracting any wavelength interval containing the wavelength variable VIP greater than 1 as an important wavelength interval, converting the wavelength column numbers of all the important wavelength intervals into wavelength index number row vectors, and sequentially merging the wavelength index number row vectors into an interval spectral wavelength index number row vector (in the embodiment, the wavelength column numbers of all the important wavelength intervals are converted into wavelength index number row vectors, and sequentially merged into the interval spectral wavelength index number row vectors, and in other embodiments, other merging manners can be adopted to obtain the interval spectral wavelength index number row vectors).
Finally, setting the population scale of the improved genetic algorithm as 100, the number of characteristic wavelengths as 25, the variation range of the row vector and the column number of the spectral wavelength index number of the important interval as the variation space of the characteristic wavelength index number, and the evolution algebra as 100, acquiring all lines of data of the soil rapid-acting phosphorus correction set visible near infrared spectrum data matrix through the characteristic wavelength index number, establishing a PLSR model, and taking the reciprocal of the Root Mean Square Error (RMSE) of the PLSR model as a fitness function F of the characteristic wavelength vector individuals, wherein the iterative optimization curve of the fitness function F is shown in FIG. 6. The 25 optimal wavelength characteristic index numbers selected by the improved genetic algorithm are converted into 25 optimal characteristic wavelength values, as shown in table 2, and table 2 shows the 25 optimal characteristic wavelength values of the soil rapid-acting phosphorus visible near infrared spectrum selected by the improved genetic algorithm. The conversion formula between the wavelength index and the wavelength value is: wavelength value-wavelength index No. +350 (nm).
TABLE 2 soil available phosphorus visible near infrared spectrum 25 optimum characteristic wavelength values selected by improved genetic algorithm
The 25 optimal characteristic wavelength distribution maps selected by the improved genetic algorithm are shown in fig. 7, so that 13 optimal characteristic wavelengths (855nm, … nm and 1198nm) in the wavelength range 800-1200 nm of the maximum peak value of the spectral wavelength variable purity gradient, 1 optimal characteristic wavelength (1398nm) in the wavelength range 1200-1655 nm of the medium peak value of the spectral wavelength variable purity gradient, and 11 optimal characteristic wavelengths (360nm, … nm and 498nm) in the wavelength range 350-800 nm of the small peak value of the spectral wavelength variable purity gradient are shown, and the accuracy of dividing the important interval spectrum by the visible near infrared spectrum variable gradient integrated genetic algorithm characteristic wavelength selection method is proved.
The beneficial effects of the embodiment of the invention are as follows: the method is characterized in that the wavelength intervals are divided by using the visible near-infrared full-spectrum wavelength variable purity gradient value of the soil quick-acting phosphorus correction set, and the important wavelength intervals with strong interpretability for predicting the content of the soil quick-acting phosphorus are extracted to form an interval spectrum when the VIP of a PLSR model is greater than 1, so that the probability of selecting potential characteristic variables in the interval spectrum by improving a genetic algorithm is greatly increased, the structure of a regression model is simplified, the calculated amount is reduced, and the prediction accuracy of the content of the soil quick-acting phosphorus is improved.
The implementation mode is as follows: a system for selecting characteristic wavelengths of a spectral variable gradient integrated genetic algorithm, comprising:
the partial least square regression model establishing module is used for utilizing visible near infrared spectrum scanning equipment to scan a spectrum data matrix generated by a plurality of samples, establishing a partial least square regression model for a full spectrum wavelength variable contained in the spectrum data matrix and determining an importance projection coefficient of the full spectrum wavelength variable;
the wavelength interval division module is used for dividing the full spectrum of the spectrum data matrix into a plurality of wavelength intervals;
the important wavelength interval determining module is used for extracting the wavelength interval containing the wavelength variable and the wavelength interval of which the important projection coefficient is greater than the preset value from all the wavelength intervals to obtain an important wavelength interval;
the genetic algorithm selection module is used for combining the important wavelength intervals of the spectrum data matrix into an interval spectrum, taking the random combination characteristic wavelength vector of the interval spectrum as an initial population of the genetic algorithm, and solving the root mean square error of the partial least square regression model; taking the reciprocal of the root mean square error of the partial least squares regression model as a fitness function of the characteristic wavelength vector, and selecting the characteristic wavelength vector with the maximum fitness value as an optimal characteristic wavelength vector; selecting, crossing and mutating the initial population, and replacing the original population with the obtained new individual to form a new population; and (5) iterating to an evolutionary algebra, and outputting a final optimal characteristic wavelength vector.
The optimal characteristic wavelength vector is used for establishing the regression model to predict the content of the substance, so that the prediction precision of the regression model can be effectively improved, the structure of the regression model is simplified, and the generalization capability and the robustness of the regression model are better. A new method for selecting characteristic wavelength variables is provided for the design of a near infrared spectrum analyzer.
On the basis of the above embodiment, the important wavelength interval determining module further includes removing one wavelength variable in each selected wavelength interval to only leave the last wavelength variable by using a backward interval partial least squares regression algorithm after obtaining the important wavelength interval, finding a combined wavelength vector corresponding to the minimum root mean square error of the partial least squares regression model in each important wavelength interval, and constructing each new important wavelength interval.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (10)
1. A characteristic wavelength selection method of a spectral variable gradient integrated genetic algorithm is characterized by comprising the following steps:
scanning a plurality of samples by using visible near infrared spectrum scanning equipment to generate a visible near infrared spectrum data matrix, establishing a partial least square regression model for full spectrum wavelength variables contained in the visible near infrared spectrum data matrix, and determining importance projection coefficients of the full spectrum wavelength variables;
dividing the full spectrum of the visible near infrared spectrum data matrix into a plurality of wavelength intervals, extracting important wavelength intervals with wavelength variables and important projection coefficients larger than a preset value from all the wavelength intervals, and combining the important wavelength intervals into an interval spectrum;
taking the random combination characteristic wavelength vector of the interval spectrum as an initial population of the genetic algorithm, and solving the root mean square error of the partial least square regression model;
taking the reciprocal of the root mean square error of the partial least squares regression model as a fitness function of the characteristic wavelength vector, and selecting the characteristic wavelength vector with the maximum fitness value as an optimal characteristic wavelength vector; selecting, crossing and mutating the initial population, and replacing the original population with the obtained new individual to form a new population; and (5) iterating to an evolutionary algebra, and outputting a final optimal characteristic wavelength vector.
2. The method for selecting the characteristic wavelength of the spectral variable gradient integrated genetic algorithm according to claim 1, wherein the method comprises the following steps: after the important wavelength intervals are obtained, removing one wavelength variable in each important wavelength interval to only leave the last wavelength variable by using a backward interval partial least square regression algorithm, searching a wavelength combination vector corresponding to the minimum root mean square error of the partial least square regression model in each important wavelength interval, constructing each new important wavelength interval and combining the new important wavelength intervals into an interval spectrum, taking the random combination characteristic wavelength vector of the interval spectrum as an initial population of the genetic algorithm, and solving the root mean square error of the partial least square regression model.
3. The method for selecting the characteristic wavelength of the spectral variable gradient integrated genetic algorithm according to claim 1, wherein the method comprises the following steps: the method for dividing the full spectrum into a plurality of wavelength intervals is as follows:
calculating a purity row vector of the full-spectrum wavelength variable and a purity gradient vector of the full-spectrum wavelength variable in the horizontal direction; and dividing the full spectrum into a plurality of wavelength intervals by using the positive and negative changes of the gradient value in the gradient vector of the full spectrum wavelength variable purity.
4. The method for selecting the characteristic wavelength of the spectral variable gradient integrated genetic algorithm according to claim 1, wherein the method comprises the following steps: the expression of the fitness function F of the characteristic wavelength vector is as follows:
F=1/RMSE,
wherein RMSE establishes the root mean square error of a partial least squares regression model for full spectrum data matrix column data, yiThe reference method test value for the ith sample,predicted value of partial least squares regression model for each characteristic wavelength variable of ith sample, npIs the number of samples.
5. The method for selecting the characteristic wavelength of the spectral variable gradient integrated genetic algorithm according to claim 1, wherein the method comprises the following steps: forming a new population according to the population size, the cross probability, the mutation probability and the selection probability of the selected genetic algorithm, wherein the mutation operator adopts a real number coding differential mutation operator, and the calculation formula is as follows:
Z(i,j)=D×(E(r1,j)-E(r2,j))+E(i,j),
wherein Z (i, j) represents a real number-encoded offspring value of the j-th chromosome of the ith individual, D represents a mutation factor, E (r1, j) represents a real number-encoded parent value of the j-th chromosome of the r1 randomly generated in the population, E (r2, j) represents a real number-encoded parent value of the j-th chromosome of the r2 randomly generated in the population, E (r1, j) -E (r2, j) represents a difference value between the real number-encoded parent value of the j-th chromosome of the r 1-th individual and the real number-encoded parent value of the j-th chromosome of the r 2-th individual, and E (i, j) represents an encoded parent value of the j-th chromosome of the ith individual.
6. The method for selecting the characteristic wavelength of the spectral variable gradient integrated genetic algorithm according to claim 1, wherein the method comprises the following steps: the method for extracting important wavelength intervals with the important projection coefficients of the wavelength variables larger than a preset value from all the wavelength intervals and combining the important wavelength intervals into an interval spectrum comprises the following steps:
converting the wavelength column numbers in all the important wavelength intervals into wavelength index number row vectors of interval spectrums; and the column number range of the wavelength index number row vector of the interval spectrum is the value range of the characteristic wavelength vector elements, and each column of data of the spectrum data matrix is obtained through a mapping table of the column number and the interval spectrum wavelength index number row vector.
7. The method for selecting the characteristic wavelength of the spectral variable gradient integrated genetic algorithm according to claim 1, wherein the method comprises the following steps: the number of the optimal characteristic wavelengths is set to be constant between 15 and 100, and the evolution algebra is set to be constant between 100 and 200.
8. A system for selecting characteristic wavelengths of a spectral variable gradient integrated genetic algorithm, comprising:
the partial least square regression model establishing module is used for scanning a plurality of samples by utilizing visible near infrared spectrum scanning equipment to generate a visible near infrared spectrum data matrix, establishing a partial least square regression model for a full spectrum wavelength variable contained in the visible near infrared spectrum data matrix, and determining an importance projection coefficient of the full spectrum wavelength variable;
the wavelength interval division module is used for dividing the full spectrum of the spectrum data matrix into a plurality of wavelength intervals;
the important wavelength interval determining module is used for extracting the wavelength interval containing the wavelength variable and the wavelength interval of which the important projection coefficient is greater than the preset value from all the wavelength intervals to obtain an important wavelength interval;
the genetic algorithm selection module is used for combining important wavelength intervals of the spectrum data matrix into an interval spectrum, taking a random combination characteristic wavelength vector of the interval spectrum as an initial population of the genetic algorithm, and solving a root mean square error of a partial least square regression model; taking the reciprocal of the root mean square error of the partial least squares regression model as a fitness function of the characteristic wavelength vector, and selecting the characteristic wavelength vector with the maximum fitness value as an optimal characteristic wavelength vector; selecting, crossing and mutating the initial population, and replacing the original population with the obtained new individual to form a new population; and (5) iterating to an evolutionary algebra, and outputting a final optimal characteristic wavelength vector.
9. The system of claim 8, wherein the significant wavelength interval determining module further comprises a step of removing one wavelength variable in each selected wavelength interval to only the last wavelength variable by using a backward interval partial least squares regression algorithm after obtaining the significant wavelength interval, and finding a combined wavelength vector corresponding to the minimum root mean square error of the partial least squares regression model in each significant wavelength interval to construct each new significant wavelength interval.
10. The system of claim 8, wherein the spectral data matrix comprises a calibration set sample spectral data matrix, a calibration set sample reference method test value matrix, a validation set sample spectral data matrix, and a validation set sample reference method test value matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911006149.XA CN110726694A (en) | 2019-10-22 | 2019-10-22 | Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911006149.XA CN110726694A (en) | 2019-10-22 | 2019-10-22 | Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110726694A true CN110726694A (en) | 2020-01-24 |
Family
ID=69222756
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911006149.XA Pending CN110726694A (en) | 2019-10-22 | 2019-10-22 | Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110726694A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111693487A (en) * | 2020-05-28 | 2020-09-22 | 济南大学 | Fruit sugar degree detection method and system based on genetic algorithm and extreme learning machine |
CN112444500A (en) * | 2020-11-11 | 2021-03-05 | 东北大学秦皇岛分校 | Alzheimer's disease intelligent detection device based on spectrum |
CN112881333A (en) * | 2021-01-13 | 2021-06-01 | 江南大学 | Near infrared spectrum wavelength screening method based on improved immune genetic algorithm |
CN113075148A (en) * | 2021-03-22 | 2021-07-06 | 久泰能源(准格尔)有限公司 | Method for measuring carbon content on surface of catalyst in MTO (methanol to olefin) process |
CN113267466A (en) * | 2021-04-02 | 2021-08-17 | 中国科学院合肥物质科学研究院 | Fruit sugar degree and acidity nondestructive testing method based on spectral wavelength optimization |
CN114019082A (en) * | 2021-11-19 | 2022-02-08 | 安徽省农业科学院土壤肥料研究所 | Soil organic matter content monitoring method and system |
CN114166764A (en) * | 2021-11-09 | 2022-03-11 | 中国农业科学院农产品加工研究所 | Method and device for constructing spectral feature model based on feature wavelength screening |
CN115326747A (en) * | 2022-08-08 | 2022-11-11 | 江西绿萌科技控股有限公司 | Method for detecting fruit surface rot by short-wave near infrared |
CN116026780A (en) * | 2023-03-28 | 2023-04-28 | 江西中医药大学 | Method and system for online detection of coating moisture absorption rate based on series strategy wavelength selection |
CN116959628A (en) * | 2023-07-25 | 2023-10-27 | 安及义实业(上海)有限公司 | Method and device for analyzing substance components in whole cell culture process |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5435309A (en) * | 1993-08-10 | 1995-07-25 | Thomas; Edward V. | Systematic wavelength selection for improved multivariate spectral analysis |
CN101430276A (en) * | 2008-12-15 | 2009-05-13 | 北京航空航天大学 | Wavelength variable optimization method in spectrum analysis |
CN102305772A (en) * | 2011-07-29 | 2012-01-04 | 江苏大学 | Method for screening characteristic wavelength of near infrared spectrum features based on heredity kernel partial least square method |
CN105630743A (en) * | 2015-12-24 | 2016-06-01 | 浙江大学 | Spectrum wave number selection method |
-
2019
- 2019-10-22 CN CN201911006149.XA patent/CN110726694A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5435309A (en) * | 1993-08-10 | 1995-07-25 | Thomas; Edward V. | Systematic wavelength selection for improved multivariate spectral analysis |
CN101430276A (en) * | 2008-12-15 | 2009-05-13 | 北京航空航天大学 | Wavelength variable optimization method in spectrum analysis |
CN102305772A (en) * | 2011-07-29 | 2012-01-04 | 江苏大学 | Method for screening characteristic wavelength of near infrared spectrum features based on heredity kernel partial least square method |
CN105630743A (en) * | 2015-12-24 | 2016-06-01 | 浙江大学 | Spectrum wave number selection method |
Non-Patent Citations (6)
Title |
---|
QUANSHENG CHEN: "Determination of total polyphenols content in green tea using FT-NIRspectroscopy and different PLS algorithms", 《JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS》 * |
北京邮电大学数学系: "《高等数学 下》", 31 July 2012, 北京邮电大学出版社 * |
姚锡凡 等: "《制造物联网技术》", 31 October 2018, 华中科技大学出版社 * |
张明锦 等: "《基于变量纯度的波长选择方法在近红外光谱分析中的应用》", 《计算机与应用化学》 * |
李志刚: "《光谱数据处理与定量分析技术》", 30 June 2017, 北京邮电大学出版社 * |
邹小波 等: "《农产品无损检测技术与数据分析方法》", 31 January 2018, 中国轻工业出版社 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111693487A (en) * | 2020-05-28 | 2020-09-22 | 济南大学 | Fruit sugar degree detection method and system based on genetic algorithm and extreme learning machine |
CN112444500A (en) * | 2020-11-11 | 2021-03-05 | 东北大学秦皇岛分校 | Alzheimer's disease intelligent detection device based on spectrum |
CN112881333A (en) * | 2021-01-13 | 2021-06-01 | 江南大学 | Near infrared spectrum wavelength screening method based on improved immune genetic algorithm |
CN113075148B (en) * | 2021-03-22 | 2023-06-16 | 久泰能源(准格尔)有限公司 | Method for measuring carbon content of catalyst surface in MTO process |
CN113075148A (en) * | 2021-03-22 | 2021-07-06 | 久泰能源(准格尔)有限公司 | Method for measuring carbon content on surface of catalyst in MTO (methanol to olefin) process |
CN113267466A (en) * | 2021-04-02 | 2021-08-17 | 中国科学院合肥物质科学研究院 | Fruit sugar degree and acidity nondestructive testing method based on spectral wavelength optimization |
CN114166764A (en) * | 2021-11-09 | 2022-03-11 | 中国农业科学院农产品加工研究所 | Method and device for constructing spectral feature model based on feature wavelength screening |
CN114019082A (en) * | 2021-11-19 | 2022-02-08 | 安徽省农业科学院土壤肥料研究所 | Soil organic matter content monitoring method and system |
CN114019082B (en) * | 2021-11-19 | 2024-05-14 | 安徽省农业科学院土壤肥料研究所 | Soil organic matter content monitoring method and system |
CN115326747A (en) * | 2022-08-08 | 2022-11-11 | 江西绿萌科技控股有限公司 | Method for detecting fruit surface rot by short-wave near infrared |
CN116026780A (en) * | 2023-03-28 | 2023-04-28 | 江西中医药大学 | Method and system for online detection of coating moisture absorption rate based on series strategy wavelength selection |
CN116026780B (en) * | 2023-03-28 | 2023-07-14 | 江西中医药大学 | Method and system for online detection of coating moisture absorption rate based on series strategy wavelength selection |
CN116959628A (en) * | 2023-07-25 | 2023-10-27 | 安及义实业(上海)有限公司 | Method and device for analyzing substance components in whole cell culture process |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110726694A (en) | Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm | |
Yun et al. | A hybrid variable selection strategy based on continuous shrinkage of variable space in multivariate calibration | |
CN101430276B (en) | Wavelength variable optimization method in spectrum analysis | |
CN110455722A (en) | Rubber tree blade phosphorus content EO-1 hyperion inversion method and system | |
US20210247367A1 (en) | Workflow-based model optimization method for vibrational spectral analysis | |
CN107817223A (en) | The construction method of quick nondestructive real-time estimate oil property model and its application | |
Yu et al. | Prediction of soil properties based on characteristic wavelengths with optimal spectral resolution by using Vis-NIR spectroscopy | |
US8635258B2 (en) | Alignment of multiple liquid chromatography-mass spectrometry runs | |
Chen et al. | A novel variable selection method based on stability and variable permutation for multivariate calibration | |
CN106248621A (en) | A kind of evaluation methodology and system | |
CN106650926A (en) | Robust boosting extreme learning machine integrated modeling method | |
Xia et al. | Non-destructive analysis the dating of paper based on convolutional neural network | |
CN114611582B (en) | Method and system for analyzing substance concentration based on near infrared spectrum technology | |
CN115398552A (en) | Use of genetic algorithms for identifying sample features based on raman spectroscopy | |
CN114062306B (en) | Near infrared spectrum data segmentation preprocessing method | |
CN109063767B (en) | Near infrared spectrum modeling method based on sample and variable consensus | |
CN114282446A (en) | Fitting prediction method based on different preference spectrum models | |
CN113125377A (en) | Method and device for detecting diesel oil property based on near infrared spectrum | |
CN113418889A (en) | Real-time detection method for water content and total number of bacterial colonies of dried vegetables based on deep learning | |
CN117074333B (en) | COD soft measurement model construction method based on ultraviolet-visible light absorption spectrum | |
Follett et al. | Achieving parsimony in bayesian VARs with the horseshoe prior | |
CN112651537A (en) | Photovoltaic power generation ultra-short term power prediction method and system | |
CN110632024B (en) | Quantitative analysis method, device and equipment based on infrared spectrum and storage medium | |
Liu et al. | Non-destructive discrimination of honey origin based on multispectral information fusion technology | |
CN117871459A (en) | Mutton crude fat content determination method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200124 |