CN118429564A - Machine learning-based three-dimensional modeling method for soil of south hills - Google Patents
Machine learning-based three-dimensional modeling method for soil of south hills Download PDFInfo
- Publication number
- CN118429564A CN118429564A CN202410881760.1A CN202410881760A CN118429564A CN 118429564 A CN118429564 A CN 118429564A CN 202410881760 A CN202410881760 A CN 202410881760A CN 118429564 A CN118429564 A CN 118429564A
- Authority
- CN
- China
- Prior art keywords
- data
- soil
- gradient
- sampling
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002689 soil Substances 0.000 title claims abstract description 178
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000010801 machine learning Methods 0.000 title claims abstract description 33
- 239000005416 organic matter Substances 0.000 claims abstract description 25
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 22
- 238000005070 sampling Methods 0.000 claims description 51
- 230000001932 seasonal effect Effects 0.000 claims description 42
- 238000000354 decomposition reaction Methods 0.000 claims description 18
- 230000002159 abnormal effect Effects 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 10
- 238000012876 topography Methods 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 239000000843 powder Substances 0.000 claims description 6
- 238000000513 principal component analysis Methods 0.000 claims description 6
- 238000005527 soil sampling Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 239000002245 particle Substances 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 4
- 238000012300 Sequence Analysis Methods 0.000 claims description 3
- JAQXDZTWVWLKGC-UHFFFAOYSA-N [O-2].[Al+3].[Fe+2] Chemical compound [O-2].[Al+3].[Fe+2] JAQXDZTWVWLKGC-UHFFFAOYSA-N 0.000 claims description 3
- 239000011449 brick Substances 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000002485 combustion reaction Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 235000015097 nutrients Nutrition 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000001556 precipitation Methods 0.000 claims description 3
- 229920006395 saturated elastomer Polymers 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims 1
- 230000006872 improvement Effects 0.000 abstract description 6
- 235000021049 nutrient content Nutrition 0.000 abstract description 4
- 238000012706 support-vector machine Methods 0.000 description 11
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/05—Geographic models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Geometry (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Remote Sensing (AREA)
- Computer Graphics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a machine learning-based three-dimensional modeling method for southern hilly soil, which is characterized in that soil data of southern hilly areas, including soil types, textures, water content, organic matter content, PH value indexes, are collected, the soil can be classified by three-dimensional modeling of the southern hilly soil through a machine learning method SVM, proper crop planting is helped to be selected, proper crops can be selected, crop yield and quality are improved, and the understanding of the indexes of nutrient content, acidity and alkalinity and the like of the soil can be helped, so that a proper soil improvement plan is formulated.
Description
Technical Field
The invention relates to the field of three-dimensional modeling, in particular to a machine learning-based southern hilly soil three-dimensional modeling method.
Background
Current crop planting and soil management often rely on traditional experience and conventional soil testing methods that have limitations in classifying and evaluating soil on a large scale and accurately.
The traditional soil classification and evaluation method is generally based on manually collected data and expert experience, and has limitations facing diversified soil types and complex soil characteristics in southern hilly areas, and in addition, the traditional soil evaluation method often cannot comprehensively consider complex correlations among various soil indexes, so that planting suggestions and soil improvement plans are difficult to accurately provide.
The machine learning method, particularly the Support Vector Machine (SVM), is adopted to perform three-dimensional modeling, a large amount of complex soil data can be better processed, and nonlinear relations among soil features are mined, so that accurate classification and modeling of soil are realized, the SVM method has strong generalization capability and high-dimensional data processing capability, the complexity and diversity of the soil data in southern hilly areas can be better dealt with, and more accurate planting suggestions and soil improvement schemes are provided.
Therefore, a machine learning method, particularly a three-dimensional modeling method by SVM, can make up for the defects of the traditional method in soil classification and evaluation, and provides more accurate and scientific soil information and crop planting suggestions.
Disclosure of Invention
The invention aims to provide a machine learning-based three-dimensional modeling method for southern hilly soil.
The invention aims to solve the problems that: soil data of the southern hilly areas are collected, wherein the soil data comprise soil types, textures, water content, organic matter content and PH value indexes, the soil of the southern hilly areas is subjected to three-dimensional modeling through a machine learning method SVM, the soil can be classified, proper crop planting is selected, farmers can select proper crops, crop yield and quality are improved, the indexes of nutrient content, acidity and alkalinity and the like of the soil can be known, and accordingly a proper soil improvement plan is formulated.
The machine learning-based three-dimensional modeling method for the southern hilly soil comprises the following steps of:
s1: a sampling tool is used for soil collection in the south hills, a sampling network is designed according to the gradient and the orientation characteristics of the terrain, soil sampling is carried out in different seasons, the sampling depth is 2/3 of the soil layer depth, 3-5 samples are taken from each sampling point and mixed into a uniform sample, the collected samples are identified, the serial numbers of the sampling points, the sampling dates and the sampling depth information are recorded, and the collected data comprise soil types, textures, water content, organic matter content, PH values, terrain data, vegetation data and humidity data;
S2: preprocessing the collected soil data, checking whether missing values, abnormal values and error values exist in the data, filling the missing values by using an interpolation method, selecting and deleting the abnormal values and the error values, standardizing the water content, the organic matter content and the PH value data, carrying out moving average and seasonal decomposition on the humidity data, carrying out normalization on the topographic data and the vegetation data, and converting the topographic data and the vegetation data into numerical codes;
S3: the preprocessed data comprise soil type, texture, water content, organic matter content, PH value, topographic data, vegetation data and humidity data, PCA is used for performing dimension reduction optimization by using principal component analysis, topographic gradient and slope direction characteristics and vegetation seasonal variation characteristics are added, and characteristic engineering for machine learning is selected;
S4: dividing the soil type, texture, water content, organic matter content, pH value, topographic data, vegetation data and humidity data after the dimension reduction optimization into a training set and a testing set, wherein 85% of the data are the training set and 15% of the data are the testing set;
S5: using a machine learning SVM algorithm to establish an SVM model, using a training set to train, and optimizing the model by adopting a regularization parameter C and a bandwidth parameter gamma in a Gaussian kernel function;
s6: using the test set to evaluate the performance of the model, and adopting a Root Mean Square Error (RMSE) and a decision coefficient (R-squared) for evaluation;
s7: after the model evaluation passes, the model is used to model soil data for the new southern hilly area.
Further, in the step S1, a sampling network is designed according to the slope and the orientation of the terrain, and soil sampling is performed in different seasons, including:
carrying out terrain analysis on the area needing to collect soil by adopting a digital elevation model, dividing different gradient intervals, wherein the gradient is 0-15 degrees mild, the gradient is 15-30 degrees moderate, the gradient is more than 30 degrees steep, customizing sampling grid density and layout according to different terrain features, and increasing sampling point density in the area with steeper gradient;
according to the direction of hilly terrain, sampling points are arranged on the south-north slopes and east-west slopes, sampling points are arranged in areas with obvious yin-yang slope differences, and sampling points are arranged at typical micro-terrain features;
And (3) making different sampling plans according to different seasons of the rainy season and the dry season in the south, collecting the saturated state of soil moisture and the moisture holding capacity of the dry season, and the states of decomposition of soil organic matters and circulation of nutrient elements in different seasons, and arranging tracking and sampling after typhoons and stormy extreme weather events.
Further, the soil type, texture, water content, organic matter content, PH value, topography data, vegetation data, humidity data acquisition method in S1 includes:
Recording type data of soil samples, such as red soil, yellow soil and brown soil, according to the soil classification system; the size and composition of soil particles are described by adopting a texture classification method, and the soil particles are recorded as texture data, such as sandy soil, loam and powder soil; measuring the moisture content in the soil by using a resistance method; measuring the organic matter content in the soil by adopting a combustion loss method; measuring the pH value of the soil by using an electronic pH meter; acquiring digital elevation model data of the region and recording the digital elevation model data as terrain data; acquiring remote sensing data of vegetation coverage type and density and recording the remote sensing data as vegetation data; the climate data of the area are obtained as precipitation and relative humidity, and are recorded as humidity data of soil.
Further, in the step S2, the abnormal value and the error value are selected to be deleted, the water content, the organic matter content and the PH value data are standardized, the humidity data are subjected to moving average and seasonal decomposition, the topography data and the vegetation data are normalized, and the soil type and the texture data are converted into numerical codes, which comprises the following steps:
for PH value data, establishing a scatter diagram to visualize the PH value data, quantitatively identifying PH value data points which deviate from a population remarkably by adopting a Grubbs statistical method, and identifying normal conditions of the PH value deviation from a conventional range caused by specific soil types such as red soil and brick red soil which are rich in iron-aluminum oxide of south hills by combining with soil science knowledge;
The method comprises the steps of converting topographic data and vegetation data into standard normal distribution data with a mean value of 0 and a variance of 1;
For humidity data, smoothing the humidity data by adopting a sliding average method, setting the size of a sliding window according to the rainfall frequency of a southern hilly area, calculating the average value and standard deviation of all annual average rainfall days based on 10-year statistical data, defining the years of which the annual average rainfall days are higher than the average value plus one standard deviation as higher rainfall frequency, the years of which the average value is lower than one standard deviation as lower rainfall frequency, setting the window of the higher rainfall frequency as 7 days, the window of the lower rainfall frequency as 21 days, setting the windows of the rest rainfall frequencies as 14 days, applying a sliding average algorithm to the humidity data sequence of each sampling point, calculating the average value of the humidity data in the window size days before and after the sampling point, and replacing the original data point by the average value;
Carrying out seasonal decomposition on the smoothed humidity data, dividing the soil humidity data according to the years, decomposing each annual data into a trend item, a seasonal item and a random item by using a seasonal decomposition method X-13ARIMA-SEATS, processing the seasonal item, identifying and subtracting a humidity peak value caused by seasonal rainfall, and recombining the processed seasonal item, the trend item and the random item to generate a seasonally adjusted humidity data sequence;
the topography data, vegetation data and humidity data are normalized by Min-Max, Where x is the raw data, x_normalized is the normalized data, max (x) is the maximum value in the raw data, and min (x) is the minimum value in the raw data;
The soil type and texture data are converted into numerical codes, and the soil type data are respectively [1, 0], [0,1,0], [0, 1] and the soil type data are respectively [2,0,0], [0,2,0], [0, 2] in terms of red soil, yellow soil and brown soil, and the texture data are respectively [0, 0], [0, 1] in terms of sandy soil, loam and powder soil.
Further, the step S3 of performing dimension reduction optimization by using principal component analysis PCA comprises the following steps:
s31: calculating a covariance matrix from the preprocessed data;
S32: the covariance matrix is subjected to eigenvalue decomposition to obtain eigenvalues and corresponding eigenvectors;
S33: sorting according to the magnitudes of the characteristic values, selecting the characteristic vectors corresponding to the maximum k characteristic values, wherein the selection of k is based on the accumulated contribution rate of the characteristic values;
s34: and carrying out linear transformation on the original data through the selected feature vector, and mapping the data into a new low-dimensional space.
Further, the step S3 of adding features of the slope and the slope direction of the terrain and seasonal variation features of the vegetation, and selecting a feature engineering for machine learning includes:
Acquiring a grid elevation data set covering an acquired area and comprising geographic coordinates and corresponding elevation values from a digital elevation model, performing quality inspection on the data, correcting abnormal values, calculating the gradient of each grid unit by adopting GIS software, calculating the gradient of each grid, performing spatial registration on the calculated gradient and gradient grid data and the existing soil attribute data, extracting the gradient and gradient values corresponding to each soil sample point, and adding the gradient and gradient values into a feature matrix as additional features;
And (3) carrying out time sequence analysis on the vegetation data after normalization processing, periodically extracting seasonal trend through fast Fourier transform FFT analysis, and constructing key features reflecting seasonal change, namely seasonal mean value, maximum value, minimum value, peak-valley difference and seasonal index, based on analysis results.
Further, the optimization of the S5 model by using the regularization parameter C and the bandwidth parameter γ in the gaussian kernel function includes:
the Gaussian kernel function is Where x is one sample data point for which a correlation is to be calculated, xi is another sample data point in the sample data set,Representing the euclidean distance, gamma is the bandwidth parameter of the gaussian kernel function, different values of C and gamma are tried through a grid search method, and the parameter combination with the best performance is selected.
The invention has the beneficial effects that: soil data of the southern hilly areas, including soil types, textures, water content, organic matter content and PH value indexes, are collected, the soil can be classified by means of machine learning through three-dimensional modeling of the southern hilly soil by means of a machine learning method SVM, proper crop planting is selected, farmers can select proper crops, crop yield and quality are improved, understanding of indexes such as nutrient content and acidity and alkalinity of the soil can be facilitated, and accordingly proper soil improvement plans are formulated.
Drawings
Fig. 1 is a flowchart of a machine learning-based three-dimensional modeling method for southern hilly soil.
Detailed Description
The present invention will be further described more fully hereinafter, but the scope of the invention is not limited thereto.
The machine learning-based three-dimensional modeling method for the southern hilly soil comprises the following steps of:
s1: a sampling tool is used for soil collection in the south hills, a sampling network is designed according to the gradient and the orientation characteristics of the terrain, soil sampling is carried out in different seasons, the sampling depth is 2/3 of the soil layer depth, 3-5 samples are taken from each sampling point and mixed into a uniform sample, the collected samples are identified, the serial numbers of the sampling points, the sampling dates and the sampling depth information are recorded, and the collected data comprise soil types, textures, water content, organic matter content, PH values, terrain data, vegetation data and humidity data;
S2: preprocessing the collected soil data, checking whether missing values, abnormal values and error values exist in the data, filling the missing values by using an interpolation method, selecting and deleting the abnormal values and the error values, standardizing the water content, the organic matter content and the PH value data, carrying out moving average and seasonal decomposition on the humidity data, carrying out normalization on the topographic data and the vegetation data, and converting the topographic data and the vegetation data into numerical codes;
S3: the preprocessed data comprise soil type, texture, water content, organic matter content, PH value, topographic data, vegetation data and humidity data, PCA is used for performing dimension reduction optimization by using principal component analysis, topographic gradient and slope direction characteristics and vegetation seasonal variation characteristics are added, and characteristic engineering for machine learning is selected;
S4: dividing the soil type, texture, water content, organic matter content, pH value, topographic data, vegetation data and humidity data after the dimension reduction optimization into a training set and a testing set, wherein 85% of the data are the training set and 15% of the data are the testing set;
S5: using a machine learning SVM algorithm to establish an SVM model, using a training set to train, and optimizing the model by adopting a regularization parameter C and a bandwidth parameter gamma in a Gaussian kernel function;
s6: using the test set to evaluate the performance of the model, and adopting a Root Mean Square Error (RMSE) and a decision coefficient (R-squared) for evaluation;
s7: after the model evaluation passes, the model is used to model soil data for the new southern hilly area.
Further, in the step S1, a sampling network is designed according to the slope and the orientation of the terrain, and soil sampling is performed in different seasons, including:
carrying out terrain analysis on the area needing to collect soil by adopting a digital elevation model, dividing different gradient intervals, wherein the gradient is 0-15 degrees mild, the gradient is 15-30 degrees moderate, the gradient is more than 30 degrees steep, customizing sampling grid density and layout according to different terrain features, and increasing sampling point density in the area with steeper gradient;
According to the direction of hilly terrain, south-north slopes and east-west slopes, sampling points are arranged on different directions, sampling points are arranged in areas with obvious yin-yang slope differences, and sampling points are arranged at typical micro-terrain features;
And (3) making different sampling plans according to different seasons of the rainy season and the dry season in the south, collecting the saturated state of soil moisture and the moisture holding capacity of the dry season, and the states of decomposition of soil organic matters and circulation of nutrient elements in different seasons, and arranging tracking and sampling after typhoons and stormy extreme weather events.
Further, the soil type, texture, water content, organic matter content, PH value, topography data, vegetation data, humidity data acquisition method in S1 includes:
Recording type data of soil samples, such as red soil, yellow soil and brown soil, according to the soil classification system; the size and composition of soil particles are described by adopting a texture classification method, and the soil particles are recorded as texture data, such as sandy soil, loam and powder soil; measuring the moisture content in the soil by using a resistance method; measuring the organic matter content in the soil by adopting a combustion loss method; measuring the pH value of the soil by using an electronic pH meter; acquiring digital elevation model data of the region and recording the digital elevation model data as terrain data; acquiring remote sensing data of vegetation coverage type and density and recording the remote sensing data as vegetation data; the climate data of the area are obtained as precipitation and relative humidity, and are recorded as humidity data of soil.
Further, in the step S2, the abnormal value and the error value are selected to be deleted, the water content, the organic matter content and the PH value data are standardized, the humidity data are subjected to moving average and seasonal decomposition, the topography data and the vegetation data are normalized, and the soil type and the texture data are converted into numerical codes, which comprises the following steps:
for PH value data, establishing a scatter diagram to visualize the PH value data, quantitatively identifying PH value data points which deviate from a population remarkably by adopting a Grubbs statistical method, and identifying normal conditions of the PH value deviation from a conventional range caused by specific soil types such as red soil and brick red soil which are rich in iron-aluminum oxide of south hills by combining with soil science knowledge;
The method comprises the steps of converting topographic data and vegetation data into standard normal distribution data with a mean value of 0 and a variance of 1;
For humidity data, smoothing the humidity data by adopting a sliding average method, setting the size of a sliding window according to the rainfall frequency of a southern hilly area, calculating the average value and standard deviation of all annual average rainfall days based on 10-year statistical data, defining the years of which the annual average rainfall days are higher than the average value plus one standard deviation as higher rainfall frequency, the years of which the average value is lower than one standard deviation as lower rainfall frequency, setting the window of the higher rainfall frequency as 7 days, the window of the lower rainfall frequency as 21 days, setting the windows of the rest rainfall frequencies as 14 days, applying a sliding average algorithm to the humidity data sequence of each sampling point, calculating the average value of the humidity data in the window size days before and after the sampling point, and replacing the original data point by the average value;
Carrying out seasonal decomposition on the smoothed humidity data, dividing the soil humidity data according to the years, decomposing each annual data into a trend item, a seasonal item and a random item by using a seasonal decomposition method X-13ARIMA-SEATS, processing the seasonal item, identifying and subtracting a humidity peak value caused by seasonal rainfall, and recombining the processed seasonal item, the trend item and the random item to generate a seasonally adjusted humidity data sequence;
the topography data, vegetation data and humidity data are normalized by Min-Max, Where x is the raw data, x_normalized is the normalized data, max (x) is the maximum value in the raw data, and min (x) is the minimum value in the raw data;
The soil type and texture data are converted into numerical codes, and the soil type data are respectively [1, 0], [0,1,0], [0, 1] and the soil type data are respectively [2,0,0], [0,2,0], [0, 2] in terms of red soil, yellow soil and brown soil, and the texture data are respectively [0, 0], [0, 1] in terms of sandy soil, loam and powder soil.
Further, the step S3 of performing dimension reduction optimization by using principal component analysis PCA comprises the following steps:
s31: calculating a covariance matrix from the preprocessed data;
S32: the covariance matrix is subjected to eigenvalue decomposition to obtain eigenvalues and corresponding eigenvectors;
S33: sorting according to the magnitudes of the characteristic values, selecting the characteristic vectors corresponding to the maximum k characteristic values, wherein the selection of k is based on the accumulated contribution rate of the characteristic values;
s34: and carrying out linear transformation on the original data through the selected feature vector, and mapping the data into a new low-dimensional space.
Further, the step S3 of adding features of the slope and the slope direction of the terrain and seasonal variation features of the vegetation, and selecting a feature engineering for machine learning includes:
Acquiring a grid elevation data set covering an acquired area and comprising geographic coordinates and corresponding elevation values from a digital elevation model, performing quality inspection on the data, correcting abnormal values, calculating the gradient of each grid unit by adopting GIS software, calculating the gradient of each grid, performing spatial registration on the calculated gradient and gradient grid data and the existing soil attribute data, extracting the gradient and gradient values corresponding to each soil sample point, and adding the gradient and gradient values into a feature matrix as additional features;
And (3) carrying out time sequence analysis on the vegetation data after normalization processing, periodically extracting seasonal trend through fast Fourier transform FFT analysis, and constructing key features reflecting seasonal change, namely seasonal mean value, maximum value, minimum value, peak-valley difference and seasonal index, based on analysis results.
Further, the optimization of the S5 model by using the regularization parameter C and the bandwidth parameter γ in the gaussian kernel function includes:
the Gaussian kernel function is Where x is one sample data point for which a correlation is to be calculated, xi is another sample data point in the sample data set,Representing the euclidean distance, gamma is the bandwidth parameter of the gaussian kernel function, different values of C and gamma are tried through a grid search method, and the parameter combination with the best performance is selected.
The invention provides a machine learning-based three-dimensional modeling method for the soil of the south hilly area, which is used for collecting soil data of the south hilly area, including soil types, textures, water content, organic matter content, PH value indexes, and the machine learning-based three-dimensional modeling method for the soil of the south hilly area is used for carrying out three-dimensional modeling on the soil of the south hilly area by using a machine learning method SVM, so that the soil can be classified, proper crop planting can be selected, proper crops can be selected, crop yield and quality can be improved, and the understanding of indexes such as nutrient content, acidity and alkalinity of the soil can be facilitated, so that a proper soil improvement plan can be formulated.
Claims (7)
1. The machine learning-based three-dimensional modeling method for the southern hilly soil is characterized by comprising the following steps of:
s1: a sampling tool is used for soil collection in the south hills, a sampling network is designed according to the gradient and the orientation characteristics of the terrain, soil sampling is carried out in different seasons, the sampling depth is 2/3 of the soil layer depth, 3-5 samples are taken from each sampling point and mixed into a uniform sample, the collected samples are identified, the serial numbers of the sampling points, the sampling dates and the sampling depth information are recorded, and the collected data comprise soil types, textures, water content, organic matter content, PH values, terrain data, vegetation data and humidity data;
S2: preprocessing the collected soil data, checking whether missing values, abnormal values and error values exist in the data, filling the missing values by using an interpolation method, selecting and deleting the abnormal values and the error values, standardizing the water content, the organic matter content and the PH value data, carrying out moving average and seasonal decomposition on the humidity data, carrying out normalization on the topographic data and the vegetation data, and converting the topographic data and the vegetation data into numerical codes;
S3: the preprocessed data comprise soil type, texture, water content, organic matter content, PH value, topographic data, vegetation data and humidity data, PCA is used for performing dimension reduction optimization by using principal component analysis, topographic gradient and slope direction characteristics and vegetation seasonal variation characteristics are added, and characteristic engineering for machine learning is selected;
S4: dividing the soil type, texture, water content, organic matter content, pH value, topographic data, vegetation data and humidity data after the dimension reduction optimization into a training set and a testing set, wherein 85% of the data are the training set and 15% of the data are the testing set;
S5: using a machine learning SVM algorithm to establish an SVM model, using a training set to train, and optimizing the model by adopting a regularization parameter C and a bandwidth parameter gamma in a Gaussian kernel function;
s6: using the test set to evaluate the performance of the model, and adopting a Root Mean Square Error (RMSE) and a decision coefficient (R-squared) for evaluation;
s7: after the model evaluation is passed, the model is used to predict and model soil data of a new southern hilly area.
2. The machine learning-based three-dimensional modeling method for southern hilly soil according to claim 1, wherein the step S1 of designing a sampling network according to the slope and orientation characteristics of the terrain, and performing soil sampling in different seasons comprises:
carrying out terrain analysis on the area needing to collect soil by adopting a digital elevation model, dividing different gradient intervals, wherein the gradient is 0-15 degrees mild, the gradient is 15-30 degrees moderate, the gradient is more than 30 degrees steep, customizing sampling grid density and layout according to different terrain features, and increasing sampling point density in the area with steeper gradient;
According to the direction of hilly terrain, south-north slopes and east-west slopes, sampling points are arranged on different directions, sampling points are arranged in areas with obvious yin-yang slope differences, and sampling points are arranged at typical micro-terrain features;
And (3) making different sampling plans according to different seasons of the rainy season and the dry season in the south, collecting the saturated state of soil moisture and the moisture holding capacity of the dry season, and the states of decomposition of soil organic matters and circulation of nutrient elements in different seasons, and arranging tracking and sampling after typhoons and stormy extreme weather events.
3. The machine learning-based southern hilly soil three-dimensional modeling method according to claim 1, wherein the soil type, texture, water content, organic matter content, PH, topography data, vegetation data, humidity data collection method in S1 comprises:
According to the soil classification system, recording type data of soil samples, namely red soil, yellow soil and brown soil; describing the size and the composition of soil particles by adopting a texture classification method, and recording the size and the composition as texture data, wherein the texture data are sandy soil, loam soil and powder soil; measuring the moisture content in the soil by using a resistance method; measuring the organic matter content in the soil by adopting a combustion loss method; measuring the pH value of the soil by using an electronic pH meter; acquiring digital elevation model data of the region and recording the digital elevation model data as terrain data; acquiring remote sensing data of vegetation coverage type and density and recording the remote sensing data as vegetation data; the climate data of the area are obtained as precipitation and relative humidity, and are recorded as humidity data of soil.
4. The machine learning based southern hilly soil three-dimensional modeling method according to claim 1, wherein the step S2 of selecting and deleting abnormal values and error values, normalizing water content, organic matter content and PH value data, performing moving average and seasonal decomposition on humidity data, normalizing topography data and vegetation data, and converting the topography data and the vegetation data into numerical codes comprises:
For PH value data, establishing a scatter diagram to visualize the PH value data, quantitatively identifying PH value data points which deviate from a group remarkably by adopting a Grubbs statistical method, and identifying normal conditions of the PH value deviation from a conventional range caused by the specific soil type red soil and the soil rich in iron-aluminum oxide of the brick red soil in the south hills by combining with the soil science knowledge;
The method comprises the steps of converting topographic data and vegetation data into standard normal distribution data with a mean value of 0 and a variance of 1;
For humidity data, smoothing the humidity data by adopting a sliding average method, setting the size of a sliding window according to the rainfall frequency of a southern hilly area, calculating the average value and standard deviation of all annual average rainfall days based on 10-year statistical data, defining the years of which the annual average rainfall days are higher than the average value plus one standard deviation as higher rainfall frequency, the years of which the average value is lower than one standard deviation as lower rainfall frequency, setting the window of the higher rainfall frequency as 7 days, the window of the lower rainfall frequency as 21 days, setting the windows of the rest rainfall frequencies as 14 days, applying a sliding average algorithm to the humidity data sequence of each sampling point, calculating the average value of the humidity data in the window size days before and after the sampling point, and replacing the original data point by the average value;
Carrying out seasonal decomposition on the smoothed humidity data, dividing the soil humidity data according to the years, decomposing each annual data into a trend item, a seasonal item and a random item by using a seasonal decomposition method X-13ARIMA-SEATS, processing the seasonal item, identifying and subtracting a humidity peak value caused by seasonal rainfall, and recombining the processed seasonal item, the trend item and the random item to generate a seasonally adjusted humidity data sequence;
the topography data, vegetation data and humidity data are normalized by Min-Max, Where x is the raw data, x_normalized is the normalized data, max (x) is the maximum value in the raw data, and min (x) is the minimum value in the raw data;
The soil type and texture data are converted into numerical codes, and the soil type data are respectively [1, 0], [0,1,0], [0, 1] and the soil type data are respectively [2,0,0], [0,2,0], [0, 2] in terms of red soil, yellow soil and brown soil, and the texture data are respectively [0, 0], [0, 1] in terms of sandy soil, loam and powder soil.
5. A machine learning based southern hilly soil three-dimensional modeling method as defined in claim 1, wherein S3 is optimized for dimension reduction using principal component analysis PCA, comprising the steps of:
s31: calculating a covariance matrix from the preprocessed data;
S32: the covariance matrix is subjected to eigenvalue decomposition to obtain eigenvalues and corresponding eigenvectors;
S33: sorting according to the magnitudes of the characteristic values, selecting the characteristic vectors corresponding to the maximum k characteristic values, wherein the selection of k is based on the accumulated contribution rate of the characteristic values;
s34: and carrying out linear transformation on the original data through the selected feature vector, and mapping the data into a new low-dimensional space.
6. The machine learning-based three-dimensional modeling method for southern hilly soil according to claim 1, wherein the step S3 of adding features of the terrain gradient and the slope direction and seasonal variation of vegetation, selecting a feature engineering for machine learning comprises:
Acquiring a grid elevation data set covering an acquired area and comprising geographic coordinates and corresponding elevation values from a digital elevation model, performing quality inspection on the data, correcting abnormal values, calculating the gradient of each grid unit by adopting GIS software, calculating the gradient of each grid, performing spatial registration on the calculated gradient and gradient grid data and the existing soil attribute data, extracting the gradient and gradient values corresponding to each soil sample point, and adding the gradient and gradient values into a feature matrix as additional features;
And (3) carrying out time sequence analysis on the vegetation data after normalization processing, periodically extracting seasonal trend through fast Fourier transform FFT analysis, and constructing key features reflecting seasonal change, namely seasonal mean value, maximum value, minimum value, peak-valley difference and seasonal index, based on analysis results.
7. The machine learning-based southern hilly soil three-dimensional modeling method as defined in claim 1, wherein the S5 model is optimized with regularization parameter C and bandwidth parameter γ in gaussian kernel function, and includes:
the Gaussian kernel function is Where x is one sample data point for which a correlation is to be calculated, xi is another sample data point in the sample data set,Representing the euclidean distance, gamma is the bandwidth parameter of the gaussian kernel function, different values of C and gamma are tried through a grid search method, and the parameter combination with the best performance is selected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410881760.1A CN118429564A (en) | 2024-07-03 | 2024-07-03 | Machine learning-based three-dimensional modeling method for soil of south hills |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410881760.1A CN118429564A (en) | 2024-07-03 | 2024-07-03 | Machine learning-based three-dimensional modeling method for soil of south hills |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118429564A true CN118429564A (en) | 2024-08-02 |
Family
ID=92310727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410881760.1A Pending CN118429564A (en) | 2024-07-03 | 2024-07-03 | Machine learning-based three-dimensional modeling method for soil of south hills |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118429564A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR3109635A1 (en) * | 2020-04-27 | 2021-10-29 | IFP Energies Nouvelles | Method of detecting at least one geological component of a rock sample |
CN115266612A (en) * | 2022-07-27 | 2022-11-01 | 福建农林大学 | A method for mapping soil available phosphorus in cultivated land in southern hilly areas based on high-resolution environmental variables |
CN116227692A (en) * | 2023-02-06 | 2023-06-06 | 中国科学院生态环境研究中心 | Crop heavy metal enrichment risk quantification method, system and storable medium |
US11704576B1 (en) * | 2020-01-29 | 2023-07-18 | Arva Intelligence Corp. | Identifying ground types from interpolated covariates |
CN116773961A (en) * | 2023-06-16 | 2023-09-19 | 广西电网有限责任公司电力科学研究院 | Transmission line corrosion detection method based on vibration signal high-frequency characteristic analysis |
CN117036088A (en) * | 2023-08-21 | 2023-11-10 | 安阳市游园管理站 | Data acquisition and analysis method for identifying growth situation of greening plants by AI |
CN117312968A (en) * | 2023-09-12 | 2023-12-29 | 宁夏大学 | Method for predicting organic matter content of saline-alkali farmland soil |
CN117390555A (en) * | 2023-10-27 | 2024-01-12 | 电子科技大学 | A method to realize multi-dimensional classification and prediction of debris flow disaster risk |
CN117393072A (en) * | 2023-10-11 | 2024-01-12 | 电子科技大学长三角研究院(湖州) | XRF soil heavy metal element quantitative analysis method based on CARS-PCA-BLS |
CN117688511A (en) * | 2023-12-22 | 2024-03-12 | 中国科学院、水利部成都山地灾害与环境研究所 | Multi-source satellite soil-water machine learning fusion method under action of geographic climate factors |
CN118094170A (en) * | 2024-04-29 | 2024-05-28 | 中国林业科学研究院森林生态环境与自然保护研究所(国家林业和草原局世界自然遗产保护研究中心) | Coupled forest soil attribute mapping method |
-
2024
- 2024-07-03 CN CN202410881760.1A patent/CN118429564A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11704576B1 (en) * | 2020-01-29 | 2023-07-18 | Arva Intelligence Corp. | Identifying ground types from interpolated covariates |
FR3109635A1 (en) * | 2020-04-27 | 2021-10-29 | IFP Energies Nouvelles | Method of detecting at least one geological component of a rock sample |
CN115266612A (en) * | 2022-07-27 | 2022-11-01 | 福建农林大学 | A method for mapping soil available phosphorus in cultivated land in southern hilly areas based on high-resolution environmental variables |
CN116227692A (en) * | 2023-02-06 | 2023-06-06 | 中国科学院生态环境研究中心 | Crop heavy metal enrichment risk quantification method, system and storable medium |
CN116773961A (en) * | 2023-06-16 | 2023-09-19 | 广西电网有限责任公司电力科学研究院 | Transmission line corrosion detection method based on vibration signal high-frequency characteristic analysis |
CN117036088A (en) * | 2023-08-21 | 2023-11-10 | 安阳市游园管理站 | Data acquisition and analysis method for identifying growth situation of greening plants by AI |
CN117312968A (en) * | 2023-09-12 | 2023-12-29 | 宁夏大学 | Method for predicting organic matter content of saline-alkali farmland soil |
CN117393072A (en) * | 2023-10-11 | 2024-01-12 | 电子科技大学长三角研究院(湖州) | XRF soil heavy metal element quantitative analysis method based on CARS-PCA-BLS |
CN117390555A (en) * | 2023-10-27 | 2024-01-12 | 电子科技大学 | A method to realize multi-dimensional classification and prediction of debris flow disaster risk |
CN117688511A (en) * | 2023-12-22 | 2024-03-12 | 中国科学院、水利部成都山地灾害与环境研究所 | Multi-source satellite soil-water machine learning fusion method under action of geographic climate factors |
CN118094170A (en) * | 2024-04-29 | 2024-05-28 | 中国林业科学研究院森林生态环境与自然保护研究所(国家林业和草原局世界自然遗产保护研究中心) | Coupled forest soil attribute mapping method |
Non-Patent Citations (1)
Title |
---|
安小宇;鲁奎豪;崔光照;: "基于改进樽海鞘优化BP神经网络的土壤墒情预测", 中国农机化学报, no. 11, 15 November 2019 (2019-11-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sagredo et al. | Climatology of Andean glaciers: A framework to understand glacier response to climate change | |
CN111368736B (en) | Rice refined estimation method based on SAR and optical remote sensing data | |
Westman | Measuring realized niche spaces: climatic response of chaparral and coastal sage scrub | |
CN112749627A (en) | Method and device for dynamically monitoring tobacco based on multi-source remote sensing image | |
CN113221765B (en) | Vegetation phenological period extraction method based on digital camera image effective pixels | |
Zhang et al. | Winter wheat identification by integrating spectral and temporal information derived from multi-resolution remote sensing data | |
CN109800921A (en) | A kind of Regional Fall Wheat yield estimation method based on remote sensing phenology assimilation and particle swarm optimization algorithm | |
Navidi et al. | Ecological potential assessment and land use area estimation of agricultural lands based on multi-time images of Sentinel-2 using ANP-WLC and GIS in Bastam, Iran | |
Fitzgerald et al. | Directed sampling using remote sensing with a response surface sampling design for site-specific agriculture | |
US20240420254A1 (en) | A versatile crop yield estimator | |
CN118376761A (en) | Detection method, equipment and medium based on soil data | |
CN117556695A (en) | A deep learning-based simulation method for crop root soil moisture content | |
CN115876721A (en) | Crop classification method, system, medium, computer equipment and terminal | |
CN114997730A (en) | Urban and rural planning and design area data intelligent monitoring analysis evaluation system based on multi-dimensional features | |
CN113139717B (en) | Crop seedling condition grading remote sensing monitoring method and device | |
CN118673296B (en) | Construction method for comprehensive renovation ecological restoration evaluation model of homeland space | |
CN113538388B (en) | Arable land loss assessment method based on MODIS NDVI time sequence data | |
Bao et al. | A fine digital soil mapping by integrating remote sensing-based process model and deep learning method in Northeast China | |
Dimyati et al. | Paddy field classification with MODIS-terra multi-temporal image transformation using phenological approach in Java Island | |
CN113570273A (en) | Spatial method and system for irrigation farmland statistical data | |
CN117292282B (en) | A method and system for monitoring the growth of gardening and greening based on high-resolution UAV remote sensing | |
CN118469060A (en) | Soil heavy metal pollution distribution simulation prediction method based on machine learning area | |
CN118537746A (en) | Remote sensing monitoring method for coastal wetland of bay | |
CN118429564A (en) | Machine learning-based three-dimensional modeling method for soil of south hills | |
Li et al. | Examining hickory plantation expansion and evaluating suitability for it using multitemporal satellite imagery and ancillary data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |