CN118429564A

CN118429564A - Machine learning-based three-dimensional modeling method for soil of south hills

Info

Publication number: CN118429564A
Application number: CN202410881760.1A
Authority: CN
Inventors: 张军; 王萃; 魏龙; 李欣; 李胜天
Original assignee: Jiangxi Space Geoinformation Engineering Group Co ltd; Geographic Information Engineering Team Of Jiangxi Provincial Bureau Of Geology
Current assignee: Jiangxi Space Geoinformation Engineering Group Co ltd; Geographic Information Engineering Team Of Jiangxi Provincial Bureau Of Geology
Priority date: 2024-07-03
Filing date: 2024-07-03
Publication date: 2024-08-02

Abstract

The invention provides a machine learning-based three-dimensional modeling method for southern hilly soil, which is characterized in that soil data of southern hilly areas, including soil types, textures, water content, organic matter content, PH value indexes, are collected, the soil can be classified by three-dimensional modeling of the southern hilly soil through a machine learning method SVM, proper crop planting is helped to be selected, proper crops can be selected, crop yield and quality are improved, and the understanding of the indexes of nutrient content, acidity and alkalinity and the like of the soil can be helped, so that a proper soil improvement plan is formulated.

Description

Machine learning-based three-dimensional modeling method for soil of south hills

Technical Field

The invention relates to the field of three-dimensional modeling, in particular to a machine learning-based southern hilly soil three-dimensional modeling method.

Background

Current crop planting and soil management often rely on traditional experience and conventional soil testing methods that have limitations in classifying and evaluating soil on a large scale and accurately.

The traditional soil classification and evaluation method is generally based on manually collected data and expert experience, and has limitations facing diversified soil types and complex soil characteristics in southern hilly areas, and in addition, the traditional soil evaluation method often cannot comprehensively consider complex correlations among various soil indexes, so that planting suggestions and soil improvement plans are difficult to accurately provide.

The machine learning method, particularly the Support Vector Machine (SVM), is adopted to perform three-dimensional modeling, a large amount of complex soil data can be better processed, and nonlinear relations among soil features are mined, so that accurate classification and modeling of soil are realized, the SVM method has strong generalization capability and high-dimensional data processing capability, the complexity and diversity of the soil data in southern hilly areas can be better dealt with, and more accurate planting suggestions and soil improvement schemes are provided.

Therefore, a machine learning method, particularly a three-dimensional modeling method by SVM, can make up for the defects of the traditional method in soil classification and evaluation, and provides more accurate and scientific soil information and crop planting suggestions.

Disclosure of Invention

The invention aims to provide a machine learning-based three-dimensional modeling method for southern hilly soil.

The invention aims to solve the problems that: soil data of the southern hilly areas are collected, wherein the soil data comprise soil types, textures, water content, organic matter content and PH value indexes, the soil of the southern hilly areas is subjected to three-dimensional modeling through a machine learning method SVM, the soil can be classified, proper crop planting is selected, farmers can select proper crops, crop yield and quality are improved, the indexes of nutrient content, acidity and alkalinity and the like of the soil can be known, and accordingly a proper soil improvement plan is formulated.

The machine learning-based three-dimensional modeling method for the southern hilly soil comprises the following steps of:

s1: a sampling tool is used for soil collection in the south hills, a sampling network is designed according to the gradient and the orientation characteristics of the terrain, soil sampling is carried out in different seasons, the sampling depth is 2/3 of the soil layer depth, 3-5 samples are taken from each sampling point and mixed into a uniform sample, the collected samples are identified, the serial numbers of the sampling points, the sampling dates and the sampling depth information are recorded, and the collected data comprise soil types, textures, water content, organic matter content, PH values, terrain data, vegetation data and humidity data;

S2: preprocessing the collected soil data, checking whether missing values, abnormal values and error values exist in the data, filling the missing values by using an interpolation method, selecting and deleting the abnormal values and the error values, standardizing the water content, the organic matter content and the PH value data, carrying out moving average and seasonal decomposition on the humidity data, carrying out normalization on the topographic data and the vegetation data, and converting the topographic data and the vegetation data into numerical codes;

S3: the preprocessed data comprise soil type, texture, water content, organic matter content, PH value, topographic data, vegetation data and humidity data, PCA is used for performing dimension reduction optimization by using principal component analysis, topographic gradient and slope direction characteristics and vegetation seasonal variation characteristics are added, and characteristic engineering for machine learning is selected;

S4: dividing the soil type, texture, water content, organic matter content, pH value, topographic data, vegetation data and humidity data after the dimension reduction optimization into a training set and a testing set, wherein 85% of the data are the training set and 15% of the data are the testing set;

S5: using a machine learning SVM algorithm to establish an SVM model, using a training set to train, and optimizing the model by adopting a regularization parameter C and a bandwidth parameter gamma in a Gaussian kernel function;

s6: using the test set to evaluate the performance of the model, and adopting a Root Mean Square Error (RMSE) and a decision coefficient (R-squared) for evaluation;

s7: after the model evaluation passes, the model is used to model soil data for the new southern hilly area.

Further, in the step S1, a sampling network is designed according to the slope and the orientation of the terrain, and soil sampling is performed in different seasons, including:

carrying out terrain analysis on the area needing to collect soil by adopting a digital elevation model, dividing different gradient intervals, wherein the gradient is 0-15 degrees mild, the gradient is 15-30 degrees moderate, the gradient is more than 30 degrees steep, customizing sampling grid density and layout according to different terrain features, and increasing sampling point density in the area with steeper gradient;

according to the direction of hilly terrain, sampling points are arranged on the south-north slopes and east-west slopes, sampling points are arranged in areas with obvious yin-yang slope differences, and sampling points are arranged at typical micro-terrain features;

And (3) making different sampling plans according to different seasons of the rainy season and the dry season in the south, collecting the saturated state of soil moisture and the moisture holding capacity of the dry season, and the states of decomposition of soil organic matters and circulation of nutrient elements in different seasons, and arranging tracking and sampling after typhoons and stormy extreme weather events.

Further, the soil type, texture, water content, organic matter content, PH value, topography data, vegetation data, humidity data acquisition method in S1 includes:

Recording type data of soil samples, such as red soil, yellow soil and brown soil, according to the soil classification system; the size and composition of soil particles are described by adopting a texture classification method, and the soil particles are recorded as texture data, such as sandy soil, loam and powder soil; measuring the moisture content in the soil by using a resistance method; measuring the organic matter content in the soil by adopting a combustion loss method; measuring the pH value of the soil by using an electronic pH meter; acquiring digital elevation model data of the region and recording the digital elevation model data as terrain data; acquiring remote sensing data of vegetation coverage type and density and recording the remote sensing data as vegetation data; the climate data of the area are obtained as precipitation and relative humidity, and are recorded as humidity data of soil.

Further, in the step S2, the abnormal value and the error value are selected to be deleted, the water content, the organic matter content and the PH value data are standardized, the humidity data are subjected to moving average and seasonal decomposition, the topography data and the vegetation data are normalized, and the soil type and the texture data are converted into numerical codes, which comprises the following steps:

for PH value data, establishing a scatter diagram to visualize the PH value data, quantitatively identifying PH value data points which deviate from a population remarkably by adopting a Grubbs statistical method, and identifying normal conditions of the PH value deviation from a conventional range caused by specific soil types such as red soil and brick red soil which are rich in iron-aluminum oxide of south hills by combining with soil science knowledge;

The method comprises the steps of converting topographic data and vegetation data into standard normal distribution data with a mean value of 0 and a variance of 1;

For humidity data, smoothing the humidity data by adopting a sliding average method, setting the size of a sliding window according to the rainfall frequency of a southern hilly area, calculating the average value and standard deviation of all annual average rainfall days based on 10-year statistical data, defining the years of which the annual average rainfall days are higher than the average value plus one standard deviation as higher rainfall frequency, the years of which the average value is lower than one standard deviation as lower rainfall frequency, setting the window of the higher rainfall frequency as 7 days, the window of the lower rainfall frequency as 21 days, setting the windows of the rest rainfall frequencies as 14 days, applying a sliding average algorithm to the humidity data sequence of each sampling point, calculating the average value of the humidity data in the window size days before and after the sampling point, and replacing the original data point by the average value;

Carrying out seasonal decomposition on the smoothed humidity data, dividing the soil humidity data according to the years, decomposing each annual data into a trend item, a seasonal item and a random item by using a seasonal decomposition method X-13ARIMA-SEATS, processing the seasonal item, identifying and subtracting a humidity peak value caused by seasonal rainfall, and recombining the processed seasonal item, the trend item and the random item to generate a seasonally adjusted humidity data sequence;

the topography data, vegetation data and humidity data are normalized by Min-Max, Where x is the raw data, x_normalized is the normalized data, max (x) is the maximum value in the raw data, and min (x) is the minimum value in the raw data;

The soil type and texture data are converted into numerical codes, and the soil type data are respectively [1, 0], [0,1,0], [0, 1] and the soil type data are respectively [2,0,0], [0,2,0], [0, 2] in terms of red soil, yellow soil and brown soil, and the texture data are respectively [0, 0], [0, 1] in terms of sandy soil, loam and powder soil.

Further, the step S3 of performing dimension reduction optimization by using principal component analysis PCA comprises the following steps:

s31: calculating a covariance matrix from the preprocessed data;

S32: the covariance matrix is subjected to eigenvalue decomposition to obtain eigenvalues and corresponding eigenvectors;

S33: sorting according to the magnitudes of the characteristic values, selecting the characteristic vectors corresponding to the maximum k characteristic values, wherein the selection of k is based on the accumulated contribution rate of the characteristic values;

s34: and carrying out linear transformation on the original data through the selected feature vector, and mapping the data into a new low-dimensional space.

Further, the step S3 of adding features of the slope and the slope direction of the terrain and seasonal variation features of the vegetation, and selecting a feature engineering for machine learning includes:

Acquiring a grid elevation data set covering an acquired area and comprising geographic coordinates and corresponding elevation values from a digital elevation model, performing quality inspection on the data, correcting abnormal values, calculating the gradient of each grid unit by adopting GIS software, calculating the gradient of each grid, performing spatial registration on the calculated gradient and gradient grid data and the existing soil attribute data, extracting the gradient and gradient values corresponding to each soil sample point, and adding the gradient and gradient values into a feature matrix as additional features;

And (3) carrying out time sequence analysis on the vegetation data after normalization processing, periodically extracting seasonal trend through fast Fourier transform FFT analysis, and constructing key features reflecting seasonal change, namely seasonal mean value, maximum value, minimum value, peak-valley difference and seasonal index, based on analysis results.

Further, the optimization of the S5 model by using the regularization parameter C and the bandwidth parameter γ in the gaussian kernel function includes:

the Gaussian kernel function is Where x is one sample data point for which a correlation is to be calculated, xi is another sample data point in the sample data set,Representing the euclidean distance, gamma is the bandwidth parameter of the gaussian kernel function, different values of C and gamma are tried through a grid search method, and the parameter combination with the best performance is selected.

The invention has the beneficial effects that: soil data of the southern hilly areas, including soil types, textures, water content, organic matter content and PH value indexes, are collected, the soil can be classified by means of machine learning through three-dimensional modeling of the southern hilly soil by means of a machine learning method SVM, proper crop planting is selected, farmers can select proper crops, crop yield and quality are improved, understanding of indexes such as nutrient content and acidity and alkalinity of the soil can be facilitated, and accordingly proper soil improvement plans are formulated.

Drawings

Fig. 1 is a flowchart of a machine learning-based three-dimensional modeling method for southern hilly soil.

Detailed Description

The present invention will be further described more fully hereinafter, but the scope of the invention is not limited thereto.

According to the direction of hilly terrain, south-north slopes and east-west slopes, sampling points are arranged on different directions, sampling points are arranged in areas with obvious yin-yang slope differences, and sampling points are arranged at typical micro-terrain features;

s31: calculating a covariance matrix from the preprocessed data;

The invention provides a machine learning-based three-dimensional modeling method for the soil of the south hilly area, which is used for collecting soil data of the south hilly area, including soil types, textures, water content, organic matter content, PH value indexes, and the machine learning-based three-dimensional modeling method for the soil of the south hilly area is used for carrying out three-dimensional modeling on the soil of the south hilly area by using a machine learning method SVM, so that the soil can be classified, proper crop planting can be selected, proper crops can be selected, crop yield and quality can be improved, and the understanding of indexes such as nutrient content, acidity and alkalinity of the soil can be facilitated, so that a proper soil improvement plan can be formulated.

Claims

1. The machine learning-based three-dimensional modeling method for the southern hilly soil is characterized by comprising the following steps of:

s7: after the model evaluation is passed, the model is used to predict and model soil data of a new southern hilly area.

2. The machine learning-based three-dimensional modeling method for southern hilly soil according to claim 1, wherein the step S1 of designing a sampling network according to the slope and orientation characteristics of the terrain, and performing soil sampling in different seasons comprises:

3. The machine learning-based southern hilly soil three-dimensional modeling method according to claim 1, wherein the soil type, texture, water content, organic matter content, PH, topography data, vegetation data, humidity data collection method in S1 comprises:

According to the soil classification system, recording type data of soil samples, namely red soil, yellow soil and brown soil; describing the size and the composition of soil particles by adopting a texture classification method, and recording the size and the composition as texture data, wherein the texture data are sandy soil, loam soil and powder soil; measuring the moisture content in the soil by using a resistance method; measuring the organic matter content in the soil by adopting a combustion loss method; measuring the pH value of the soil by using an electronic pH meter; acquiring digital elevation model data of the region and recording the digital elevation model data as terrain data; acquiring remote sensing data of vegetation coverage type and density and recording the remote sensing data as vegetation data; the climate data of the area are obtained as precipitation and relative humidity, and are recorded as humidity data of soil.

4. The machine learning based southern hilly soil three-dimensional modeling method according to claim 1, wherein the step S2 of selecting and deleting abnormal values and error values, normalizing water content, organic matter content and PH value data, performing moving average and seasonal decomposition on humidity data, normalizing topography data and vegetation data, and converting the topography data and the vegetation data into numerical codes comprises:

For PH value data, establishing a scatter diagram to visualize the PH value data, quantitatively identifying PH value data points which deviate from a group remarkably by adopting a Grubbs statistical method, and identifying normal conditions of the PH value deviation from a conventional range caused by the specific soil type red soil and the soil rich in iron-aluminum oxide of the brick red soil in the south hills by combining with the soil science knowledge;

5. A machine learning based southern hilly soil three-dimensional modeling method as defined in claim 1, wherein S3 is optimized for dimension reduction using principal component analysis PCA, comprising the steps of:

s31: calculating a covariance matrix from the preprocessed data;

6. The machine learning-based three-dimensional modeling method for southern hilly soil according to claim 1, wherein the step S3 of adding features of the terrain gradient and the slope direction and seasonal variation of vegetation, selecting a feature engineering for machine learning comprises:

7. The machine learning-based southern hilly soil three-dimensional modeling method as defined in claim 1, wherein the S5 model is optimized with regularization parameter C and bandwidth parameter γ in gaussian kernel function, and includes: