[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Spatiotemporal Patterns for Agroforestry Tree Crops in the U.S. Corn Belt for USDA Census of Agriculture Periods 2012–2022
Previous Article in Journal
Genotype-Driven Phenotype Prediction in Onion Breeding: Machine Learning Models for Enhanced Bulb Weight Selection
Previous Article in Special Issue
Combining UAV Multispectral Imaging and PROSAIL Model to Estimate LAI of Potato at Plot Scale
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Segmentation and Proportion Extraction of Crop, Crop Residues, and Soil Using Digital Images and Deep Learning

1
College of Information and Management Science, Henan Agricultural University, Zhengzhou 450002, China
2
School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
3
Key Laboratory of Emergency Satellite Engineering and Application, Ministry of Emergency Management, Beijing 100124, China
4
School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
5
International Institute for Earth System Science, Nanjing University, Nanjing 210023, China
6
Key Laboratory of Quantitative Remote Sensing in Agriculture, Ministry of Agriculture, Beijing Research Center for Information Technology in Agriculture, Beijing 100097, China
7
Key Lab of Smart Agriculture System, Ministry of Education, China Agricultural University, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Agriculture 2024, 14(12), 2240; https://doi.org/10.3390/agriculture14122240
Submission received: 14 October 2024 / Revised: 30 November 2024 / Accepted: 4 December 2024 / Published: 6 December 2024
Figure 1
<p>Study area and experimental sites. (<b>a</b>) Location of the study area. (<b>b</b>) Soybean experimental field. (<b>c</b>) Field “crop–crop residue–soil” digital images, <span class="html-italic">f<sub>CR</sub></span> (crop residue coverage).</p> ">
Figure 2
<p>Original image and annotated image.</p> ">
Figure 3
<p>Methodology framework.</p> ">
Figure 4
<p>CCRSNet architecture.</p> ">
Figure 5
<p>mIoU and loss curves of CCRSNet semantic segmentation network with different backbone networks during calibration.</p> ">
Figure 6
<p>Visualization of segmentation results for processed images using the CCRSNet model with VGG16 as the backbone network.</p> ">
Figure 7
<p>Visualization of segmentation results for original images using the CCRSNet model with VGG16 as the backbone network.</p> ">
Figure 8
<p>Class activation mapping using the CCRSNet model with VGG16 as the backbone network.</p> ">
Figure 9
<p>Proportion extraction of crop, crop residues, and soil using digital images and deep learning based on the TVD and IVD datasets. (<b>a</b>) crop (TVD-vali). (<b>b</b>) crop residues (TVD-vali). (<b>c</b>) soil (TVD-vali). (<b>d</b>) crop (IVD). (<b>e</b>) crop residues (IVD). (<b>f</b>) soil (IVD).</p> ">
Figure A1
<p>Ablation experiment architectures. (<b>a</b>) CCRSNet without the deep and shallow feature structure, (<b>b</b>) CCRSNet without the attention module.</p> ">
Versions Notes

Abstract

:
Conservation tillage involves covering the soil surface with crop residues after harvest, typically through reduced or no-tillage practices. This approach increases the soil organic matter, improves the soil structure, prevents erosion, reduces water loss, promotes microbial activity, and enhances root development. Therefore, accurate information on crop residue coverage is critical for monitoring the implementation of conservation tillage practices. This study collected “crop–crop residues–soil” images from wheat-soybean rotation fields using mobile phones to create calibration, validation, and independent validation datasets. We developed a deep learning model named crop–crop residue–soil segmentation network (CCRSNet) to enhance the performance of cropland “crop–crop residues–soil” image segmentation and proportion extraction. The model enhances the segmentation accuracy and proportion extraction by extracting and integrating shallow and deep image features and attention modules to capture multi-scale contextual information. Our findings indicated that (1) lightweight models outperformed deeper networks for “crop–crop residues–soil” image segmentation. When CCRSNet employed a deep network backbone (ResNet50), its feature extraction capability was inferior to that of lighter models (VGG16). (2) CCRSNet models that integrated shallow and deep features with attention modules achieved a high segmentation and proportion extraction performance. Using VGG16 as the backbone, CCRSNet achieved an mIoU of 92.73% and a PA of 96.23% in the independent validation dataset, surpassing traditional SVM and RF models. The RMSE for the proportion extraction accuracy ranged from 1.05% to 3.56%. These results demonstrate the potential of CCRSNet for the accurate, rapid, and low-cost detection of crop residue coverage. However, the generalizability and robustness of deep learning models depend on the diversity of calibration datasets. Further experiments across different regions and crops are required to validate this method’s accuracy and applicability for “crop–crop residues–soil” image segmentation and proportion extraction.

1. Introduction

Crop residues are one of the important solid waste resources in farmlands, playing a crucial role in sustaining the ecological environment of agricultural fields [1]. Crop residues can increase the soil organic matter, improve the soil structure, prevent soil erosion, reduce soil and water loss, promote microbial activity, and enhance the development of crop roots [2,3,4,5,6]. The crop residue coverage is vital for implementing conservation tillage practices. China places great importance on the comprehensive utilization of crop residues and vigorously promotes policies for crop residue incorporation into the soil [7]. Reasonable crop residue coverage can increase the soil organic matter by 7.59–14.60% and the crop yield by 4.11–11.8% [8]. Therefore, understanding the information on crop residue coverage is essential for maintaining farmland ecology and increasing crop yields [1].
Image classification and line-point transect methods are two primary manual techniques for assessing field crop residue coverage [6]. However, these manual methods are time-consuming and labor-intensive, and their low accuracy presents a significant limitation. In recent years, image segmentation algorithms have been increasingly applied in agriculture [9,10,11]. Image segmentation, a core task in computer vision, initially focused on basic image processing techniques. Early segmentation methods primarily relied on simple thresholding, where images were divided into regions based on grayscale values [12]. While thresholding is computationally simple, its effectiveness is limited by the image quality and lighting conditions [13]. Later, segmentation techniques began incorporating statistical models and graph theory. For instance, Gaussian mixture models (GMMs) and Markov random fields (MRFs) were developed for more complex segmentation tasks [14,15]. GMMs differentiate regions using probabilistic pixel modeling, while MRFs address local consistency by considering spatial relationships between pixels [16,17]. These approaches, however, are computationally intensive and require a precise initialization. As computing power advanced, segmentation methods evolved to include shape- and edge-detection techniques. Edge-based methods, such as the Canny edge detector and Sobel operator, became popular for detecting image edges [5,18]. These methods perform image segmentation by detecting edges within the image. Additionally, active contour models (e.g., Snakes) and level set methods were developed, improving the detection of complex shapes and boundaries [19,20]. These models enhance segmentation accuracy, especially in images with well-defined boundaries [21]. These methods effectively address challenges related to varying scales and complex structures that earlier approaches struggled with. However, manually defined image features often fail to capture the full complexity of environments, limiting traditional algorithms to specific image types and hindering their performance in more intricate scenes. Although these methods are simple and interpretable, they remain competitive in certain applications.
Machine learning methods have also been introduced into image segmentation, with support vector machine (SVM) and random forest (RF) algorithms emerging as important tools [22,23]. SVM perform classification and segmentation by constructing a hyperplane in high-dimensional feature spaces, while RF enhances robustness and accuracy by integrating multiple decision trees for random feature selection and sample classification [7,24]. Nevertheless, when processing images of soil and crop residues with similar color features, traditional machine learning algorithms often suffer from poor generalization, resulting in frequent over-segmentation and under-segmentation.
Deep learning-based image segmentation has increasingly been applied to real-world scenarios [25]. The fully convolutional networks (FCNs) replace traditional network structures with convolutional layers, enabling spatial mapping rather than merely predicting class probabilities for each pixel [26]. More advanced convolutional neural networks, such as ResNet and MobileNet, achieve better performance by stacking convolutional layers and utilizing depthwise separable convolution to improve model efficiency and speed while maintaining high accuracy [27]. Ronneberger et al. [28] developed U-Net, which utilizes skip connections between the encoder and decoder to capture contextual information, particularly for medical image segmentation. Chen et al. [29] introduced DeepLab, which incorporates atrous convolution to improve multi-scale perception capabilities. However, deep learning models often overlook the semantic context across different scales, leading to inadequate segmentation accuracy, particularly in distinguishing similar features [30]. This issue becomes more pronounced in agricultural applications, where crop residues and soil often have similar characteristics, resulting in a lower segmentation accuracy compared to natural or man-made environments.
In recent years, the introduction of attention mechanisms and self-attention models, such as the Transformer, has significantly advanced image segmentation technology [31]. Methods like the Vision Transformer (ViT), which integrate self-attention mechanisms, have further improved segmentation accuracy and generalization [32]. While increasing the number of convolutional layers enhances the model’s feature extraction capabilities, deepening the network structure raises the computational costs and demands higher hardware performance for model deployment. Existing research often emphasizes deep feature extraction, overlooking shallow feature information. This over-reliance on deep features can degrade model performance in areas rich in details or with ambiguous boundaries. For instance, the color similarity between crop residues and soil, combined with varying camera angles, results in complex distributions of “soil–crop residue” targets in farmland images. Therefore, when designing deep learning-based image segmentation models for separating and extracting the proportions of “crop–crop residue–soil” in farmland, it is crucial to balance deep and shallow features and fully leverage semantic context at multiple scales to achieve accurate information extraction.
In recent years, many machine learning algorithms have been applied to the work of crop residue coverage extraction [33,34,35]. For example, Zhou et al. proposed a method to extract crop residue coverage from UAV (Unmanned Aerial Vehicle) images by using an improved deep learning model; Yue et al. proposed to use the radiative transfer model and a deep learning algorithm to draw crop residue coverage maps with the help of remote sensing images, etc. [3,36]. These research results show that for different image types (such as UAV images and remote sensing images), using appropriate models and algorithms can effectively carry out the extraction and mapping work of crop residue coverage.
The primary aim of this study was to propose a method for segmenting and extracting the proportions of crops, crop residues, and soil from digital images using deep learning. A deep learning model, crop–crop residue–soil segmentation network (CCRSNet), was developed to improve the performance of image segmentation and proportion extraction in cropland areas. CCRSNet integrates both shallow and deep image features and incorporates a pyramid pooling module to capture contextual information at multiple scales. The study compared the performance of current machine learning-based segmentation methods, including RF and SVM, with the proposed CCRSNet model. The results demonstrate that CCRSNet, by leveraging shallow feature information and the pyramid pooling module, achieves superior segmentation and proportion extraction of crops, crop residues, and soil.

2. Materials and Methods

2.1. Study Area and Experimental Design

2.1.1. Study Area

The study area is located in Yuanyang County, Xinxiang City, Henan Province (Figure 1a, 113°36′~114°15′ E, l34°55′~35°11′ N), China. Yuanyang County is situated in a mid-latitude zone, characterized by a warm temperate continental monsoon climate, with an average annual precipitation of 490 mm. The local agriculture primarily involves wheat-corn and wheat-soybean rotations, with wheat typically sown in mid-October and both soybeans and corn generally planted in late June.

2.1.2. Field Experimental Design

We collected the field “crop–crop residue–soil” images for the two stages of soybean growth stages during the summer of 2023, from seeding (23 July) to flowering (25 August) stages. This is because there are rarely “crop–crop residue–soil” scenarios under medium to high soybean cover. This study employed an Apple iPhone 14 Pro (48-megapixel mode) to capture field images of “crop–crop residue–soil.” The images were collected under sunny conditions in a conservation tillage soybean field (Figure 1b), with the goal of obtaining high-definition digital images of the soil, crop residue, and crops. At the time of collection, the crop height ranged between 20 and 30 cm, and the phone was held manually at a height of 100–110 cm (from ground surface), oriented vertically downward (Figure 1c), This ensures the picture quality is as good as possible and reduces the errors caused by image distortion. A total of 200 images depicting “crop–crop residue–soil” scenes was taken. However, due to unavoidable distortions and human interference, shadows from personnel and other artifacts, such as shoes, appeared at the edges of the images. As a result, it was necessary to manually crop these edge-affected areas for deep learning model calibration, quantitative validation, and visual assessment. The cropping process was carried out in two ways:
  • For constructing the training dataset and performing quantitative validation, images with a pixel size of 256 × 256 × 3 were extracted from the central region of the original images and labeled according to land cover categories.
  • For field image segmentation applications and visual evaluations, only the edge-affected areas were manually removed.

2.2. Data Pre-Processing

2.2.1. Data Annotation

The Labelme annotation platform was employed to manually label the cropped images [37]. Upon completion of the annotation, the exported json files were converted to generate single-channel grayscale label masks, which provided labels for subsequent deep learning model calibration. The images were categorized into three classes: soil, crop, crop residue (Table 1). Figure 2 illustrates the original images and their corresponding labels during the dataset annotation process.

2.2.2. Data Augmentation

After data annotation, we divided the dataset into a training and validation dataset (TVD; n = 160) and an independent validation dataset (IVD; n = 40). The TVD dataset was used to calibrate and validate the machine learning and deep learning models, while the IVD dataset, containing 40 digital images, was employed for independent evaluation to further assess the model’s performance in real-world scenarios. This step was essential for evaluating the model’s generalization ability by testing its effectiveness on unseen data.
The TVD dataset consisted of 160 digital farmland images, which were augmented in various ways to increase diversity and complexity, thereby improving the dataset’s representativeness. Specifically, three augmentation techniques were applied:
  • Rotation: A random angle between −90° and 90° was used to rotate the images, altering the positions of the categories to be segmented.
  • Horizontal mirroring: This technique was applied to create left-right symmetrical versions of the original images.
  • Vertical mirroring: This technique generated top–bottom symmetrical versions of the original images.
After augmentation, we obtained 960 digital farmland images along with their corresponding annotation results. Of these, 75% (n = 720) were designated as the calibration set for model training, while the remaining 25% (n = 240) were allocated as the validation set for initial performance assessment and tuning.

2.3. Technical Workflow

This study focused on a dataset of images featuring crops, crop residue, and soil from various agricultural fields, applying different machine learning and deep learning models for image segmentation. The primary goal was to extract the proportions of each category and compare the models’ performance. The technical workflow, illustrated in Figure 3, consisted of the following four steps:
  • Image acquisition: Images of crops, crop residue, and soil were collected at different times from agricultural fields within a specified region. The TVD dataset was used for calibration and validation, while the IVD dataset served as an independent validation set to assess the models’ generalization performance and robustness.
  • Data preprocessing: The dataset was manually annotated, categorizing the images into three classes: “soil–crop–crop residue”. Data augmentation techniques, such as rotation and mirroring, were applied to enhance the dataset.
  • Model construction: The proposed CCRSNet model architecture utilized VGG16 and ResNet50 as backbone networks.
  • Accuracy verification: Based on the TVD and IVD datasets, the segmentation accuracies of CCRSNet, RF, and SVM for “soil–crop–crop residue” in farmland were compared. Ablation experiments were also conducted to assess the contributions of different components within CCRSNet.

2.4. CCRSNet Segmentation Model

Semantic segmentation is a core task in the field of computer vision, aiming to assign a distinct class label to each pixel in an image, thereby generating a segmentation image of the same size as the original. The rapid development of CNN-based semantic segmentation models has led to the emergence of many innovative network architectures, such as PSPNet, the DeepLab series, and the U-Net series [38,39].

2.4.1. Backbone Network

The backbone network is a crucial component of deep learning networks responsible for feature extraction from input images. The extracted features are subsequently fed into other parts of the network, and the backbone network can be replaced with various architectures. Commonly utilized backbone networks include VGG16 and ResNet50, which are primarily classification networks known for their superior performance [40].
VGG16 was first proposed by the Visual Geometry Group at the University of Oxford [41]. The VGG model is composed of multiple 2 × 2 pooling layers and 3 × 3 convolution kernels. It has a simple and clear structure and is an excellent network for image recognition and classification. Although the VGG model has a large number of parameters and high computational cost, its contribution to deep learning and computer vision research is still significant, especially in understanding the effectiveness of the neural network hierarchical structure.
ResNet50 network introduced the concept of the residual network [42]. Due to the powerful feature extraction ability of the proposed residual structure, it is widely used in various deep learning networks. The deep residual network effectively overcomes the problems of reduced learning efficiency and ineffective improvement of accuracy caused by the increase in network depth. ResNet is mainly composed of convolution residual blocks (conv block) and skip residual blocks (identity block). Among them, the convolution residual block is used to change the network dimension, and the skip residual block is used to deepen the network. Therefore, the ResNet network has good performance in image recognition and classification.

2.4.2. CCRSNet Model Architecture

This study introduces a novel network architecture named CCRSNet, which employs ResNet50 or VGG16 as the backbone and incorporates channel and spatial attention mechanisms and a pyramid pooling module during upsampling. Current research often focuses on the deep features of convolutional networks, while CCRSNet enhances performance by integrating both shallow and deep image features from the backbone network, allowing for a more comprehensive capture of various feature information within images. This design enables CCRSNet to efficiently extract and utilize rich information from images, thereby improving segmentation accuracy and the performance and generalization ability of proportion extraction for crops, crop residue, and soil in farmland. The specific network architecture is depicted in Figure 4.
The following is an explanation of part of the structure of CCRSNet:
  • Input: The input image size is 256 × 256 × 3.
  • Backbone: VGG16 and ResNet50 were selected as the backbone networks for CCRSNet.
  • Pyramid pooling module (PPM): PPM is composed of multiple pooling layers, each using windows of different sizes (including 1 × 1, 2 × 2, 3 × 3, and 6 × 6 pooling windows in this study) to extract contextual information of images at different scales. Subsequently, the pooled information is adjusted to the same size and concatenated through upsampling [43].
  • Attention module: The attention mechanism module in this study includes channel attention and spatial attention mechanisms, with lightweight characteristics and minimal impact on model complexity [44].

2.5. Baseline Models

This study selected two machine learning models, RF and SVM, as baseline models to achieve segmentation-like effects by classifying image pixels and analyzing the resulting confusion matrices. The proposed CCRSNet segmentation model’s performance in segmenting and proportionally extracting crops, crop residue, and soil in farmland was explored.
RF is an ensemble learning method. RF constructs multiple decision trees and aggregates their results to improve prediction accuracy and stability [45]. The fundamental concept combines several decision trees to mitigate overfitting issues and enhance prediction accuracy. In RF, each decision tree is trained using a random subset of the data and a random subset of features, thus increasing model diversity and generalization capacity.
SVM is widely used in machine learning and pattern recognition. It optimizes classification accuracy by maximizing the margin between the hyperplane and the nearest data points (support vectors), which helps enhance the model’s generalization ability and reduce classification errors. When data are not linearly separable, SVM employs kernel functions (such as gaussian kernel) to map data to a higher-dimensional space, achieving linear separability in the new space [46].

2.6. Deep Learning Model Parameter Settings

This study used a workstation equipped with an Intel i7-13650HX processor, 24 GB of RAM, and an RTX 4070 GPU (8 GB VRAM) for model calibration. During the deep learning model calibration, the first five epochs employed transfer learning with pretrained parameters from the backbone networks (VGG16 and ResNet50), with 100 training epochs set. The number of decision trees for the RF model calibration was 100, and the random seed parameter was 42. A linear kernel function was selected in SVM calibration, with the random seed parameter also set to 42. Each model underwent three calibration and validation iterations, retaining the optimal result.
During model calibration, a batch size of 2 was set when freezing parameters, and a batch size 16 was employed when unfreezing parameters. The optimizer was configured as ‘adam’, with a maximum learning rate of 1 × 10−6 and a minimum of 1 × 10−4, controlling the learning rate’s range. Cross-entropy loss was introduced into the loss function to address the issue of extremely imbalanced sample sizes across categories.

2.7. Segmentation and Classification Accuracy Evaluation Metrics

The confusion matrix is a widely used tool for evaluating model performance. Table 2 presents the confusion matrix for a binary classification problem. In the confusion matrix, TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. The confusion matrix can be applied to both binary and multi-class classification problems.
Additionally, standard evaluation metrics for semantic segmentation derived from the confusion matrix include intersection over union (IoU), mean IoU (mIoU), recall, and pixel accuracy (PA).
  • The ratio of the intersection of predicted results and true annotations to their union, reflecting the model’s prediction performance for each category.
I o U = T P T P + F P + F N
2.
The average of IoUs across all categories, indicating the model’s average segmentation performance across all categories, serving as a crucial metric for assessing segmentation performance.
m I o U = 1 N i = 1 N I o U i
3.
The proportion of correctly predicted pixels to the total number of pixels reflects the model’s overall prediction accuracy, although it does not consider the performance of each category.
P A = T P + T N T P + F P + F N + T N
4.
The proportion of correctly predicted positive samples among all actual positives reflects the model’s ability to detect true positives.
R e c a l l = T P T P + F N
Generally, higher values of mIoU, IoU, recall, and PA indicate better model performance.

2.8. Proportion Extraction Evaluation Metrics

To evaluate the accuracy of proportion extraction for the three categories of “soil–crop–crop residue”, we employed RMSE and the coefficient of determination (R²).
R M S E = 1 m i = 1 m y i y ^ i 2
R 2 = 1 i = 1 m y ^ i y i 2 i = 1 m y ¯ i y i 2
where y i represents the actual measured value, y ¯ i is the sample mean, y i ^ denotes the estimated value, and m indicates the sample size. For the same sample data, a higher R2 and a lower RMSE generally indicate better model accuracy.

3. Results

3.1. Model Calibration and Validation Based on the TVD Dataset

Table 3 presents the RF and SVM models’ validation accuracy based on the TVD dataset. The RF model achieved the highest accuracy (mIoU: 71.29%), while the SVM model had the lowest accuracy (mIoU: 21.64%). We also trained and validated the proposed CCRSNet model. The loss curves of the CCRSNet segmentation network during calibration are shown in Figure 5. It can be seen from Figure 5 that the CCRSNet model with VGG16 as the backbone network shows a relatively good training effect. During the training process, the loss of this model decreased more rapidly and had a smaller value. Table 3 also presents the accuracy evaluation results based on the TVD dataset. Our findings demonstrate that the CCRSNet model, using VGG16 as the backbone network, achieved the best performance (mIoU: 87.84%, PA: 93.70%). Compared to the RF model, the mIoU of CCRSNet improved by 16.55%. Moreover, compared with the ResNet50 model, the VGG16 network structure is relatively simple, has a relatively small number of parameters, and requires less storage space and computing resources. Therefore, it can perform better in scenarios with smaller datasets.

3.2. Model Evaluation Based on the IVD Dataset

We compared the segmentation accuracy of different models based on the IVD dataset (Table 4). The experimental results (Table 4) indicated that the deep learning models had a higher accuracy than the traditional machine learning models. The results indicate that the CCRSNet model, with VGG16 as the backbone network, performed the best on the IVD datasets. Specifically, the CCRSNet model combined with VGG16 achieved the highest performance on the IVD dataset (mIoU: 92.73%, PA: 96.23%). The CCRSNet model can effectively extract feature information from segmentation targets even when the number of validation samples is relatively small.

3.3. CCRSNet Ablation Study Based on the TVD and IVD Datasets

To further validate the effectiveness of (1) the attention module and (2) the deep and shallow feature structure module in the proposed CCRSNet, this study conducted ablation experiments using the TVD and IVD datasets. The three comparative ablation experiments and their respective architectures were as follows:
(i)
Exp. 1: CCRSNet (Figure 4);
(ii)
Exp. 2: CCRSNet without the deep and shallow feature structure (Appendix A, Figure A1a);
(iii)
Exp. 3: CCRSNet without the attention module (Appendix A, Figure A1b).
Table 5 presents the segmentation accuracy for each category and the mIoU based on the TVD and IVD datasets. The results show that for the TVD dataset, the segmentation accuracy and mIoU for farmland images in Experiments 2 and 3 were lower than in Experiment 1. Experiment 3, which lacked the attention module, demonstrated a 0.15% improvement in mIoU on the independent validation set compared to Experiment 2 (which did not utilize shallow features). This indicates that integrating shallow feature information enhanced the model’s generalization capability and robustness.

3.4. Segmentation Results and Proportional Extraction Accuracy Based on CCRSNet

Based on the model’s performance on the TVD and IVD datasets, we ultimately selected the CCRSNet model with VGG16 as the backbone network (TVD dataset: mIoU of 87.84%, PA of 93.70%; IVD dataset: mIoU of 92.73%, PA of 96.23%) to perform “crop–crop residues–soil” image segmentation and proportional extraction for farmland images. Figure 6 shows two randomly selected farmland images, along with their annotation and prediction results, while Figure 7 displays the original farmland images and their corresponding predictions to illustrate the model’s performance under real-world conditions. Figure 8 shows the visual explanations provided by highlighting the image regions that the model considers crucial for a specific prediction through Grad-CAM. It can be seen from the figure that the model pays the highest attention to crop residue. Our results demonstrate that the CCRSNet model, using VGG16 as the backbone, is highly effective in achieving “crop–crop residues–soil” image segmentation for farmland images.
We performed a statistical analysis and accuracy evaluation of the model’s proportional extraction results based on the TVD dataset. Figure 9a–c show scatter plots comparing the model’s predicted proportions with the actual proportions in the TVD dataset. Figure 9a–c corresponds to the proportional extraction of the three categories, respectively. Our results show that the R² for all three categories exceeded 0.99. Using the RMSE as the metric, the crop proportion extraction performed best (RMSE = 0.55%), followed by crop residues (RMSE = 0.80%), while soil extraction showed the lowest performance (RMSE = 0.89%). We also conducted a statistical analysis and accuracy evaluation on the model’s proportional extraction results using the IVD dataset. Figure 9d–f shows scatter plots comparing the predicted and actual proportions in the IVD dataset. Figure 9d–f represents the proportional extraction of the three categories, respectively. Similar to the TVD dataset, our results demonstrate that the R² for all three categories was above 0.99. In terms of the RMSE, the crop proportion extraction again performed best (RMSE = 1.05%), followed by crop residues (RMSE = 3.13%), and soil (RMSE = 3.56%) showed the weakest performance. The CCRSNet model provided high-accuracy proportional extraction results for the three categories.

4. Discussion

4.1. Advantages of Deep Learning in Image Segmentation and Proportion Extraction

Traditional machine learning algorithms for image segmentation typically rely on expert experience for feature selection and extraction, which is time-consuming and labor-intensive. Moreover, when facing large-scale datasets, these algorithms have high computational demands and limited generalization capabilities, making it challenging to model nonlinear data. In contrast, deep learning models trained on large-scale data can learn more refined features, resulting in higher segmentation accuracy and stable performance across different complex backgrounds. Compared to traditional machine learning algorithms (such as RF and SVM), deep learning models (like CCRSNet) can automatically learn and extract multi-level features from images without the need for manual feature design [47].
Compared to benchmark models such as RF and SVM, the deep learning model CCRSNet designed in this study stands out for its ability to extract and integrate features from both shallow and deep layers of images. Furthermore, it incorporates an attention module to capture contextual information at different scales within the image. This enhances its generalization capabilities and robustness, allowing the model to adapt to various agricultural environments. As shown in Table 4, the CCRSNet model, utilizing VGG16 as its backbone network, delivers high-performance segmentation and proportion extraction of crop, crop residues, and soil in farmlands. The model achieved an mIoU of 87.84% on the TVD dataset and 93.70% on the IVD dataset, making it the most accurate model among those tested. In summary, the CCRSNet model exhibits significant advantages over traditional machine learning algorithms regarding accuracy, generalization capability, and overall performance, particularly for complex tasks such as digital image segmentation and the proportion extraction of crop, crop residues, and soil in agricultural settings.
Numerous studies have demonstrated that segmentation algorithms based on deep learning can provide high-precision segmentation results in various natural scene images, such as those of buildings and people [48,49]. The CCRSNet model mentioned in this paper achieved an mIoU of 87.84% and a PA of 93.70% on the TVD dataset. Consequently, our research further indicates that the deep learning-based CCRSNet can distinguish between similar features, such as crop residues and soil, in digital images of farmlands. Additionally, our findings show that the RMSE for the proportion extraction accuracy of crop, crop residues, and soil using CCRSNet ranged from 1.05% to 3.56%. This suggests that the designed deep learning model can offer remarkably high precision in providing information on crop residue coverage, potentially aiding agricultural experts in implementing conservation tillage practices.

4.2. Disadvantages of Deep Learning for Crop, Crop Residues, and Soil Proportion Extraction

Deep learning models’ reliance on a large volume of manually annotated data during calibration notably impacts their performance in segmentation tasks. The model must be trained using extensive, high-quality annotated data to achieve satisfactory segmentation results. This typically involves considerable workforce and time investment in the dataset preparation phase to collect and annotate sufficient agricultural image data. This process is cumbersome and highly time-consuming, especially when dealing with targets with similar features, as the workload for annotation can substantially increase. Additionally, obtaining high-quality annotated data in practical applications poses various challenges. For instance, seasonal changes may cause inconsistencies in target features across different seasons, affecting the model’s calibration effectiveness. Weather conditions can also impact data collection; for example, rainy or foggy weather might reduce image quality, affecting the data’s accuracy and reliability [50]. The high cost of data collection equipment is another significant factor, particularly in agricultural production, where high-precision remote sensing devices or drones may require substantial financial investments, posing a formidable barrier for small farms or research institutions with limited resources.
The calibration of deep learning models depends on extensive labeled data and substantial computational resources. The calibration of deep neural networks typically requires high-performance computing hardware, such as GPUs. Moreover, the model’s inference stage also demands certain computational capabilities, which could be a limiting factor for some low-cost devices or field application scenarios. In real-time applications, insufficient computational resources could lead to delays in model inference, potentially affecting the efficiency of final decision-making processes [51].
The “black box” nature of deep learning models makes their internal decision-making processes opaque, leading to poor interpretability [52]. In application fields like agriculture, users often desire to understand the basis of the model’s decisions to make more accurate management decisions. The decision-making processes of deep learning models are usually difficult to intuitively comprehend, and this lack of interpretability could limit their practical application. If the model’s decision-making process cannot be explained, users might question the model’s reliability.
Deep learning models’ universality and generalization ability heavily depend on the comprehensiveness of the calibration data. Validation under broader conditions is necessary to ensure excellent performance across different regions, environments, and crops. Furthermore, updating and optimizing the model is a continuous process. As new data and knowledge accumulate, the model needs ongoing adjustments and improvements to maintain its performance and accuracy. This process requires not only technical support, but also sustained research efforts. Continuous optimization and updates ensure the model remains effective in constantly changing environments.

5. Conclusions

In this study, we designed the CCRSNet network model to enhance the segmentation and proportional extraction performance of crop, crop residues, and soil based on high-resolution digital images. CCRSNet improves the performance by extracting and fusing both shallow and deep image features and integrating an attention module to capture multi-scale contextual information. The main conclusions of this study are as follows:
(i)
Compared to traditional machine learning models (e.g., RF and SVM), deep learning models are more suitable for the segmentation and proportional extraction of farmland “crop–crop residues–soil.” When using VGG16 as the backbone network for the CCRSNet segmentation model, CCRSNet achieved a segmentation accuracy of 96.23%, which is notably higher than traditional models like SVM (58.00%) and RF (82.47%).
(ii)
The CCRSNet model, which is capable of fusing shallow and deep image features and incorporates an attention module, provides high-performance results for farmland “crop–crop residues–soil” image segmentation and proportional extraction. On the IVD independent validation dataset, the CCRSNet model with a lightweight backbone network achieved an mIoU of 92.73%, a PA of 96.23%, and proportional extraction RMSE values between 1.05% and 3.56%.
The generalization and robustness of machine learning models are dependent on the comprehensiveness of the calibration dataset. However, the method proposed in this study was only tested on the constructed TVD and IVD datasets. Future research should conduct experiments in additional regions and on a wider variety of crops to validate the accuracy and generalizability of the proposed method for “crop–crop residues–soil” image segmentation and proportional extraction.

Author Contributions

Methodology, G.G. and J.Y.; validation, G.G.; investigation, G.G., J.S., Y.Y., Y.F. and J.Y.; writing—original draft, G.G.; writing—review and editing, G.G., S.Z., J.S., K.H., J.T., Y.Y., Q.T., Y.F., H.F., Y.L. and J.Y.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a grant from Key Laboratory of Emergency Satellite Engineering and Application, Ministry of Emergency Management, and the National Natural Science Foundation of China (42101362, 42101321).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The dataset is available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

To further verify the effectiveness of (1) the attention module and (2) the deep and shallow feature structure module in the proposed CCRSNet, we conducted ablation experiments based on the TVD and IVD datasets. The ablation experiments (Exp. 2 and Exp. 3) and corresponding architectures are shown in Figure A1.
Figure A1. Ablation experiment architectures. (a) CCRSNet without the deep and shallow feature structure, (b) CCRSNet without the attention module.
Figure A1. Ablation experiment architectures. (a) CCRSNet without the deep and shallow feature structure, (b) CCRSNet without the attention module.
Agriculture 14 02240 g0a1

References

  1. Liu, J.; Qiu, T.; Peñuelas, J.; Sardans, J.; Tan, W.; Wei, X.; Cui, Y.; Cui, Q.; Wu, C.; Liu, L.; et al. Crop Residue Return Sustains Global Soil Ecological Stoichiometry Balance. Glob. Chang. Biol. 2023, 29, 2203–2226. [Google Scholar] [CrossRef] [PubMed]
  2. Delandmeter, M.; Colinet, G.; Pierreux, J.; Bindelle, J.; Dumont, B. Combining Field Measurements and Process-based Modelling to Analyse Soil Tillage and Crop Residues Management Impacts on Crop Production and Carbon Balance in Temperate Areas. Soil Use Manag. 2024, 40, 13098. [Google Scholar] [CrossRef]
  3. Yue, J.; Tian, Q.; Liu, Y.; Fu, Y.; Tian, J.; Zhou, C.; Feng, H.; Yang, G. Mapping Cropland Rice Residue Cover Using a Radiative Transfer Model and Deep Learning. Comput. Electron. Agric. 2023, 215, 108421. [Google Scholar] [CrossRef]
  4. Su, Y.; Gabrielle, B.; Makowski, D. The Impact of Climate Change on the Productivity of Conservation Agriculture. Nat. Clim. Chang. 2021, 11, 628–633. [Google Scholar] [CrossRef]
  5. Gao, P.; Song, Y.; Minhui, S.; Qian, P.; Su, Y. Extract Nanoporous Gold Ligaments from SEM Images by Combining Fully Convolutional Network and Sobel Operator Edge Detection Algorithm. SSRN Electron. J. 2021, 365, 536–538. [Google Scholar] [CrossRef]
  6. Yue, J.; Tian, Q.; Tang, S.; Xu, K.; Zhou, C. A Dynamic Soil Endmember Spectrum Selection Approach for Soil and Crop Residue Linear Spectral Unmixing Analysis. Int. J. Appl. Earth Obs. Geoinf. 2019, 78, 306–317. [Google Scholar] [CrossRef]
  7. Belgiu, M.; Drăguţ, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  8. Islam, M.U.; Guo, Z.; Jiang, F.; Peng, X. Does Straw Return Increase Crop Yield in the Wheat-Maize Cropping System in China? A Meta-Analysis. F. Crop. Res. 2022, 279, 108447. [Google Scholar] [CrossRef]
  9. Yue, J.; Tian, Q.; Dong, X.; Xu, N. Using Broadband Crop Residue Angle Index to Estimate the Fractional Cover of Vegetation, Crop Residue, and Bare Soil in Cropland Systems. Remote Sens. Environ. 2020, 237, 111538. [Google Scholar] [CrossRef]
  10. Dolata, P.; Wróblewski, P.; Mrzygłód, M.; Reiner, J. Instance Segmentation of Root Crops and Simulation-Based Learning to Estimate Their Physical Dimensions for on-Line Machine Vision Yield Monitoring. Comput. Electron. Agric. 2021, 190, 106451. [Google Scholar] [CrossRef]
  11. Mishra, S.; Mishra, D.; Santra, G.H. Applications of Machine Learning Techniques in Agricultural Crop Production: A Review Paper. Indian J. Sci. Technol. 2016, 9, 1–14. [Google Scholar] [CrossRef]
  12. Song, H.; Wang, J.; Bei, J.; Wang, M. Modified Snake Optimizer Based Multi-Level Thresholding for Color Image Segmentation of Agricultural Diseases. Expert Syst. Appl. 2024, 255, 124624. [Google Scholar] [CrossRef]
  13. Shang, C.; Zhang, D.; Yang, Y. A Gradient-Based Method for Multilevel Thresholding. Expert Syst. Appl. 2021, 175, 114845. [Google Scholar] [CrossRef]
  14. Gupta, L.; Sortrakul, T. A Gaussian-Mixture-Based Image Segmentation Algorithm. Pattern Recognit. 1998, 31, 315–325. [Google Scholar] [CrossRef]
  15. Panjwani, D.K.; Healey, G. Markov Random Field Models for Unsupervised Segmentation of Textured Color Images. IEEE Trans. Pattern Anal. Mach. Intell. 1995, 17, 939–954. [Google Scholar] [CrossRef]
  16. Chen, Y.; Cheng, N.; Cai, M.; Cao, C.; Yang, J.; Zhang, Z. A Spatially Constrained Asymmetric Gaussian Mixture Model for Image Segmentation. Inf. Sci. 2021, 575, 41–65. [Google Scholar] [CrossRef]
  17. Trombini, M.; Solarna, D.; Moser, G.; Dellepiane, S. A Goal-Driven Unsupervised Image Segmentation Method Combining Graph-Based Processing and Markov Random Fields. Pattern Recognit. 2023, 134, 109082. [Google Scholar] [CrossRef]
  18. Yang, Y.; Zhao, X.; Huang, M.; Wang, X.; Zhu, Q. Multispectral Image Based Germination Detection of Potato by Using Supervised Multiple Threshold Segmentation Model and Canny Edge Detector. Comput. Electron. Agric. 2021, 182, 106041. [Google Scholar] [CrossRef]
  19. Hashim, F.A.; Hussien, A.G. Snake Optimizer: A Novel Meta-Heuristic Optimization Algorithm. Knowledge-Based Syst. 2022, 242, 108320. [Google Scholar] [CrossRef]
  20. Ding, Z.; Li, C.; Huang, R.; Gatenby, C.J.; Metaxas, D.N.; Gore, J.C. A Level Set Method for Image Segmentation in the Presence of Intensity Inhomogeneities with Application to MRI. IEEE Trans. Image Process. 2011, 20, 2007–2016. [Google Scholar] [CrossRef]
  21. Wang, Z.; Wan, L.; Xiong, N.; Zhu, J.; Ciampa, F. Variational Level Set and Fuzzy Clustering for Enhanced Thermal Image Segmentation and Damage Assessment. NDT E Int. 2021, 118, 102396. [Google Scholar] [CrossRef]
  22. Yue, J.; Tian, Q. Estimating Fractional Cover of Crop, Crop Residue, and Soil in Cropland Using Broadband Remote Sensing Data and Machine Learning. Int. J. Appl. Earth Obs. Geoinf. 2020, 89, 102089. [Google Scholar] [CrossRef]
  23. Guerrero, J.M.; Pajares, G.; Montalvo, M.; Romeo, J.; Guijarro, M. Support Vector Machines for Crop/Weeds Identification in Maize Fields. Expert Syst. Appl. 2012, 39, 11149–11155. [Google Scholar] [CrossRef]
  24. Xu, J.; Zhou, S.; Xu, A.; Ye, J.; Zhao, A. Automatic Scoring of Postures in Grouped Pigs Using Depth Image and CNN-SVM. Comput. Electron. Agric. 2022, 194, 106746. [Google Scholar] [CrossRef]
  25. Wang, H.; Ma, Z.; Ren, Y.; Du, S.; Lu, H.; Shang, Y.; Hu, S.; Zhang, G.; Meng, Z.; Wen, C.; et al. Interactive Image Segmentation Based Field Boundary Perception Method and Software for Autonomous Agricultural Machinery Path Planning. Comput. Electron. Agric. 2024, 217, 108568. [Google Scholar] [CrossRef]
  26. Li, Y.; Zhao, H.; Qi, X.; Wang, L.; Li, Z.; Sun, J.; Jia, J. Fully Convolutional Networks for Panoptic Segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2024; 2021; pp. 214–223. [Google Scholar]
  27. Shaheed, K.; Mao, A.; Qureshi, I.; Kumar, M.; Hussain, S.; Ullah, I.; Zhang, X. DS-CNN: A Pre-Trained Xception Model Based on Depth-Wise Separable Convolutional Neural Network for Finger Vein Recognition. Expert Syst. Appl. 2022, 191, 116288. [Google Scholar] [CrossRef]
  28. Ronneberger, O.; Fischer, P.B.T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  29. Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
  30. Beeche, C.; Singh, J.P.; Leader, J.K.; Gezer, N.S.; Oruwari, A.P.; Dansingani, K.K.; Chhablani, J.; Pu, J. Super U-Net: A Modularized Generalizable Architecture. Pattern Recognit. 2022, 128, 108669. [Google Scholar] [CrossRef]
  31. Qi, J.; Liu, X.; Liu, K.; Xu, F.; Guo, H.; Tian, X.; Li, M.; Bao, Z.; Li, Y. An Improved YOLOv5 Model Based on Visual Attention Mechanism: Application to Recognition of Tomato Virus Disease. Comput. Electron. Agric. 2022, 194, 106780. [Google Scholar] [CrossRef]
  32. Chen, H.; He, Y.; Zhang, L.; Yao, S.; Yang, W.; Fang, Y.; Liu, Y.; Gao, B. A Landslide Extraction Method of Channel Attention Mechanism U-Net Network Based on Sentinel-2A Remote Sensing Images. Int. J. Digit. Earth 2023, 16, 552–577. [Google Scholar] [CrossRef]
  33. Yue, J.; Tian, Q.; Dong, X.; Xu, K.; Zhou, C. Using Hyperspectral Crop Residue Angle Index to Estimate Maize and Winter-Wheat Residue Cover: A Laboratory Study. Remote Sens. 2019, 11, 807. [Google Scholar] [CrossRef]
  34. Ding, Y.; Zhang, H.; Wang, Z.; Xie, Q.; Wang, Y.; Liu, L.; Hall, C.C. A Comparison of Estimating Crop Residue Cover from Sentinel-2 Data Using Empirical Regressions and Machine Learning Methods. Remote Sens. 2020, 12, 1470. [Google Scholar] [CrossRef]
  35. Yue, J.; Fu, Y.; Guo, W.; Feng, H.; Qiao, H. Estimating Fractional Coverage of Crop, Crop Residue, and Bare Soil Using Shortwave Infrared Angle Index and Sentinel-2 MSI. Int. J. Remote Sens. 2022, 43, 1253–1273. [Google Scholar] [CrossRef]
  36. Zhou, D.; Li, M.; Li, Y.; Qi, J.; Liu, K.; Cong, X.; Tian, X. Detection of Ground Straw Coverage under Conservation Tillage Based on Deep Learning. Comput. Electron. Agric. 2020, 172, 105369. [Google Scholar] [CrossRef]
  37. Torralba, A.; Russell, B.C.; Yuen, J. LabelMe: Online Image Annotation and Applications. Proc. IEEE 2010, 98, 1467–1484. [Google Scholar] [CrossRef]
  38. Zhang, S.; Zhang, C. Modified U-Net for Plant Diseased Leaf Image Segmentation. Comput. Electron. Agric. 2023, 204, 107511. [Google Scholar] [CrossRef]
  39. Chen, J.; Mei, J.; Li, X.; Lu, Y.; Yu, Q.; Wei, Q.; Luo, X.; Xie, Y.; Adeli, E.; Wang, Y.; et al. TransUNet: Rethinking the U-Net Architecture Design for Medical Image Segmentation through the Lens of Transformers. Med. Image Anal. 2024, 97, 103280. [Google Scholar] [CrossRef]
  40. Qiang, J.; Liu, W.; Li, X.; Guan, P.; Du, Y.; Liu, B.; Xiao, G. Detection of Citrus Pests in Double Backbone Network Based on Single Shot Multibox Detector. Comput. Electron. Agric. 2023, 212, 108158. [Google Scholar] [CrossRef]
  41. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015—Conf. Track Proc. 2014, 1–14. [Google Scholar] [CrossRef]
  42. Hattiya, T.; Dittakan, K.; Musikasuwan, S. Diabetic Retinopathy Detection Using Convolutional Neural Network: A Comparative Study on Different Architectures. Eng. Access 2021, 7, 50–60. [Google Scholar] [CrossRef]
  43. Lian, X.; Pang, Y.; Han, J.; Pan, J. Cascaded Hierarchical Atrous Spatial Pyramid Pooling Module for Semantic Segmentation. Pattern Recognit. 2021, 110, 107622. [Google Scholar] [CrossRef]
  44. Du, L.; Lu, Z.; Li, D. Broodstock Breeding Behaviour Recognition Based on Resnet50-LSTM with CBAM Attention Mechanism. Comput. Electron. Agric. 2022, 202, 107404. [Google Scholar] [CrossRef]
  45. Bai, J.; Li, Y.; Li, J.; Yang, X.; Jiang, Y.; Xia, S.-T. Multinomial Random Forest. Pattern Recognit. 2022, 122, 108331. [Google Scholar] [CrossRef]
  46. Dong, S. Multi Class SVM Algorithm with Active Learning for Network Traffic Classification. Expert Syst. Appl. 2021, 176, 114885. [Google Scholar] [CrossRef]
  47. Wang, P.; Fan, E.; Wang, P. Comparative Analysis of Image Classification Algorithms Based on Traditional Machine Learning and Deep Learning. Pattern Recognit. Lett. 2021, 141, 61–67. [Google Scholar] [CrossRef]
  48. Wang, C.; Antos, S.E.; Triveno, L.M. Automatic Detection of Unreinforced Masonry Buildings from Street View Images Using Deep Learning-Based Image Segmentation. Autom. Constr. 2021, 132, 103968. [Google Scholar] [CrossRef]
  49. Zhao, H.; Zheng, J.; Wang, Y.; Yuan, X.; Li, Y. Portrait Style Transfer Using Deep Convolutional Neural Networks and Facial Segmentation. Comput. Electr. Eng. 2020, 85, 106655. [Google Scholar] [CrossRef]
  50. Barbedo, J.G.A. Impact of Dataset Size and Variety on the Effectiveness of Deep Learning and Transfer Learning for Plant Disease Classification. Comput. Electron. Agric. 2018, 153, 46–53. [Google Scholar] [CrossRef]
  51. Ghasemi, F.; Mehridehnavi, A.; Pérez-Garrido, A.; Pérez-Sánchez, H. Neural Network and Deep-Learning Algorithms Used in QSAR Studies: Merits and Drawbacks. Drug Discov. Today 2018, 23, 1784–1790. [Google Scholar] [CrossRef]
  52. Liang, Y.; Li, S.; Yan, C.; Li, M.; Jiang, C. Explaining the Black-Box Model: A Survey of Local Interpretation Methods for Deep Neural Networks. Neurocomputing 2021, 419, 168–182. [Google Scholar] [CrossRef]
Figure 1. Study area and experimental sites. (a) Location of the study area. (b) Soybean experimental field. (c) Field “crop–crop residue–soil” digital images, fCR (crop residue coverage).
Figure 1. Study area and experimental sites. (a) Location of the study area. (b) Soybean experimental field. (c) Field “crop–crop residue–soil” digital images, fCR (crop residue coverage).
Agriculture 14 02240 g001
Figure 2. Original image and annotated image.
Figure 2. Original image and annotated image.
Agriculture 14 02240 g002
Figure 3. Methodology framework.
Figure 3. Methodology framework.
Agriculture 14 02240 g003
Figure 4. CCRSNet architecture.
Figure 4. CCRSNet architecture.
Agriculture 14 02240 g004
Figure 5. mIoU and loss curves of CCRSNet semantic segmentation network with different backbone networks during calibration.
Figure 5. mIoU and loss curves of CCRSNet semantic segmentation network with different backbone networks during calibration.
Agriculture 14 02240 g005
Figure 6. Visualization of segmentation results for processed images using the CCRSNet model with VGG16 as the backbone network.
Figure 6. Visualization of segmentation results for processed images using the CCRSNet model with VGG16 as the backbone network.
Agriculture 14 02240 g006
Figure 7. Visualization of segmentation results for original images using the CCRSNet model with VGG16 as the backbone network.
Figure 7. Visualization of segmentation results for original images using the CCRSNet model with VGG16 as the backbone network.
Agriculture 14 02240 g007
Figure 8. Class activation mapping using the CCRSNet model with VGG16 as the backbone network.
Figure 8. Class activation mapping using the CCRSNet model with VGG16 as the backbone network.
Agriculture 14 02240 g008
Figure 9. Proportion extraction of crop, crop residues, and soil using digital images and deep learning based on the TVD and IVD datasets. (a) crop (TVD-vali). (b) crop residues (TVD-vali). (c) soil (TVD-vali). (d) crop (IVD). (e) crop residues (IVD). (f) soil (IVD).
Figure 9. Proportion extraction of crop, crop residues, and soil using digital images and deep learning based on the TVD and IVD datasets. (a) crop (TVD-vali). (b) crop residues (TVD-vali). (c) soil (TVD-vali). (d) crop (IVD). (e) crop residues (IVD). (f) soil (IVD).
Agriculture 14 02240 g009
Table 1. Annotation standards for agricultural crops, crop residue, and soil images.
Table 1. Annotation standards for agricultural crops, crop residue, and soil images.
LabelCategoryDescription
L0CropThe green plants in farmland are mainly soybeans and a small amount of weeds.
L1Crop residueThe residue after wheat harvesting, mainly consisting of stems; small residues may labeled as soil.
L2SoilDry, moist, shaded, and illuminated soil; small pieces of soil region may be labeled as crop residue under high crop residue coverage.
Table 2. Confusion matrix.
Table 2. Confusion matrix.
TypePrediction
ActualLabelPositiveNegative
Positivetrue positives (TPs)false negatives (FNs)
Negativefalse positives (FPs)true negatives (TNs)
Table 3. Performance evaluation based on the TVD dataset.
Table 3. Performance evaluation based on the TVD dataset.
Model and BackboneRFSVMCCRSNet
VGG16ResNet50
RecallL0 (crop)99.11%82.24%99.21%99.37%
L1 (crop residue)62.55%45.56%90.41%92.93%
L2 (soil)82.87%4.8%92.67%91.41%
PA80.70%33.54%93.70% *93.54%
mIoU71.29%21.64%87.84% *87.72%
* indicates the highest mIoU and PA.
Table 4. Performance evaluation based on the IVD dataset.
Table 4. Performance evaluation based on the IVD dataset.
Model and BackboneRFSVMCCRSNet
VGGResNet50
RecallL0 (crop)98.22%87.87%98.56%98.51%
L1 (crop residues)64.35%45.56%94.27%95.43%
L2 (soil)78.83%4.85%96.16%94.46%
PA82.47%58.00%96.23% *96.19%
mIoU66.86%32.98%92.73% *92.65%
* indicates the highest mIoU and PA.
Table 5. Ablation experiments based on the TVD and IVD datasets.
Table 5. Ablation experiments based on the TVD and IVD datasets.
TypeExp. 1Exp. 2Exp. 3
RecallLabelTVD-valiIVDTVD-valiIVDTVD-valiIVD
L0 (crop)99.21%98.56%98.87%97.73%98.95%98.66%
L1 (crop residues)90.41%94.27%90.45%94.34%92.18%93.64%
L2 (soil)92.67%96.16%92.10%95.62%91.79%95.74%
PA93.70% *96.23% **93.30%95.83%93.50%95.91%
mIoU87.84% *92.73% **87.20%91.98%87.57%92.13%
* indicates the highest mIoU and PA based on the TVD validation dataset; ** indicates the highest mIoU and PA based on the IVD dataset.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, G.; Zhang, S.; Shen, J.; Hu, K.; Tian, J.; Yao, Y.; Tian, Q.; Fu, Y.; Feng, H.; Liu, Y.; et al. Segmentation and Proportion Extraction of Crop, Crop Residues, and Soil Using Digital Images and Deep Learning. Agriculture 2024, 14, 2240. https://doi.org/10.3390/agriculture14122240

AMA Style

Gao G, Zhang S, Shen J, Hu K, Tian J, Yao Y, Tian Q, Fu Y, Feng H, Liu Y, et al. Segmentation and Proportion Extraction of Crop, Crop Residues, and Soil Using Digital Images and Deep Learning. Agriculture. 2024; 14(12):2240. https://doi.org/10.3390/agriculture14122240

Chicago/Turabian Style

Gao, Guangfu, Shanxin Zhang, Jianing Shen, Kailong Hu, Jia Tian, Yihan Yao, Qingjiu Tian, Yuanyuan Fu, Haikuan Feng, Yang Liu, and et al. 2024. "Segmentation and Proportion Extraction of Crop, Crop Residues, and Soil Using Digital Images and Deep Learning" Agriculture 14, no. 12: 2240. https://doi.org/10.3390/agriculture14122240

APA Style

Gao, G., Zhang, S., Shen, J., Hu, K., Tian, J., Yao, Y., Tian, Q., Fu, Y., Feng, H., Liu, Y., & Yue, J. (2024). Segmentation and Proportion Extraction of Crop, Crop Residues, and Soil Using Digital Images and Deep Learning. Agriculture, 14(12), 2240. https://doi.org/10.3390/agriculture14122240

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop