[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
The Effects of Mineral Supplementation in Rapeseed Cake Diet on Thyroid Function and Meat Quality in Broiler Chickens
Previous Article in Journal
Trends and Applications of Computed Tomography in Agricultural Non-Destructive Testing
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cross-Modal Feature Fusion for Field Weed Mapping Using RGB and Near-Infrared Imagery

1
College of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, Nanjing 210037, China
2
Fujian Key Laboratory of Spatial Information Perception and Intelligent Processing, Yango University, Fuzhou 350015, China
*
Author to whom correspondence should be addressed.
Agriculture 2024, 14(12), 2331; https://doi.org/10.3390/agriculture14122331
Submission received: 4 November 2024 / Revised: 13 December 2024 / Accepted: 17 December 2024 / Published: 19 December 2024
(This article belongs to the Section Digital Agriculture)

Abstract

:
The accurate mapping of weeds in agricultural fields is essential for effective weed control and enhanced crop productivity. Moving beyond the limitations of RGB imagery alone, this study presents a cross-modal feature fusion network (CMFNet) designed for precise weed mapping by integrating RGB and near-infrared (NIR) imagery. CMFNet first applies color space enhancement and adaptive histogram equalization to improve the image brightness and contrast in both RGB and NIR images. Building on a Transformer-based segmentation framework, a cross-modal multi-scale feature enhancement module is then introduced, featuring spatial and channel feature interaction to automatically capture complementary information across two modalities. The enhanced features are further fused and refined by integrating an attention mechanism, which reduces the background interference and enhances the segmentation accuracy. Extensive experiments conducted on two public datasets, the Sugar Beets 2016 and Sunflower datasets, demonstrate that CMFNet significantly outperforms CNN-based segmentation models in the task of weed and crop segmentation. The model achieved an Intersection over Union (IoU) metric of 90.86% and 90.77%, along with a Mean Accuracy (mAcc) of 93.8% and 94.35%, respectively. Ablation studies further validate that the proposed cross-modal fusion method provides substantial improvements over basic feature fusion methods, effectively localizing weed and crop regions across diverse field conditions. These findings underscore their potential as a robust solution for precise and adaptive weed mapping in complex agricultural landscapes.

1. Introduction

Weeds in agricultural fields are wild plants that grow independently of crops, creating significant challenges for agriculture [1,2]. These plants compete aggressively with crops for essential resources such as nutrients, water, and sunlight, which limits crop growth and results in lower yields. In addition, weeds serve as habitats for pests and pathogens, often acting as carriers for viruses and bacteria that increase the risk of crop diseases. The presence of weeds also alters soil properties—physically, chemically, and biologically—thereby reducing soil fertility and aeration, which hinders crop health [3]. Thus, effective weed control is a critical measure for improving field productivity and maintaining sustainable agricultural practices.
Current mainstream weed control methods are generally divided into two stages: pre-growth prevention and post-growth weeding [4]. Pre-growth preventive measures include physical control, which involves obstructing weed growth through physical means like laying plastic mulch or shading nets, and ecological control, which seeks to modify soil or adjust cropping systems to create an environment unfavorable for weed growth [5]. Post-growth weeding methods include mechanical weeding, biological control, and chemical herbicides. Mechanical weeding, for instance, utilizes machinery like tractors or weeding machines; while effective, it requires large, flat fields and may risk injuring crops or struggle with smaller, hard-to-remove weeds [6]. Manual weeding is flexible but highly labor-intensive, requiring substantial labor costs and training. Biological control, which relies on natural enemies or microbial resources to suppress weed growth, is environmentally friendly but typically region-specific and lacks broad applicability [7]. Among these, chemical herbicides are widely adopted due to their ease of application, labor and time savings, high efficiency, rapid action, and cost-effectiveness [8]. However, conventional chemical weed control often involves broad application across entire farm areas, which can harm crops, leave pesticide residues on crop surfaces, and, over time, increase weed resistance to herbicides. This can also result in soil and water contamination due to excessive use. Therefore, precisely mapping weeds in fields, coupled with tailored weed control strategies, is critical for minimizing environmental impact and enhancing the effectiveness of weed management practices [9,10].
Traditional weed mapping methods involve image processing techniques [11,12]. Hlaing et al. [11] employed the Excess Green grayscale method to binarize color images, converting them to grayscale to enhance the contrast between crops and other areas. This was combined with a regional threshold segmentation approach, effectively achieving the separation of crops from weeds. Similarly, Hiremath et al. [12] proposed a texture feature segmentation algorithm based on the Markov Random Field (MRF) model. This algorithm precisely defines the spatial context and probabilistic interactions between pixels, enabling effective handling of spectral and textural uncertainties in images. Even under natural light conditions, this method achieved a high segmentation accuracy of 97.8% for specific weeds. On the other hand, Guijarro et al. [13] proposed an image segmentation strategy based on discrete wavelet transform, leveraging the high spatial variability and irregular, random distribution of weeds and crops in agricultural imagery. The strategy first extracts vegetation information using the Normalized Difference Vegetation Index (NDVI) [14] to highlight green vegetation areas. Then, it employs wavelet transform to extract spatial structural information in different orientations. Texture descriptors are utilized to capture the spatial variability within these bands, combining greenness and texture information to enhance crop and weed differentiation. The above image processing methods rely on threshold segmentation techniques for weed segmentation, offering a relatively simple structural approach suited to images with distinct grayscale differences. However, these methods face limitations in more complex scenes.
With the development of machine learning algorithms, Support Vector Machines (SVMs) with stronger generalization capabilities have been increasingly applied to more complex foreground–background segmentation tasks [15,16]. SVMs are well suited to learning intricate spatial relationships and handling images with highly nonlinear features or requiring multi-feature relationships. Le et al. [15], for instance, combined scale-invariant Local Binary Pattern (LBP) texture features with an SVM classifier, successfully differentiating corn from specific types of weeds. Addressing the challenge in humid environments where vegetation adheres to soil, causing greenness-based recognition methods to fail, Guerrero et al. [16] initially used Otsu’s adaptive thresholding to separate vegetation from soil and then applied an SVM to classify crops and weeds. They calculated the average of the relevant support vectors to define the boundary between the weeds and unshielded corn. These approaches underscore the potential of SVMs and color transformation techniques in weed segmentation, especially under challenging conditions where traditional methods struggle.
In recent years, deep learning technologies represented by convolutional neural networks (CNNs) and Transformer models have advanced rapidly, gaining widespread application in computer vision [17,18] and smart agriculture [19,20], which exhibit a promising potential that surpasses traditional machine learning approaches [21]. Due to their exceptional capabilities, deep learning technologies have also been applied to weed mapping [22,23]. McCool et al. [24] developed a hybrid model by integrating Inception blocks—lightweight deep convolutional modules—to create a framework suitable for deployment on agricultural robots designed for weed management. This model strikes an effective balance between accuracy and memory usage, making it practical for real-time applications. Champ et al. [25] applied Mask R-CNN, a region-based convolutional neural network model, on a robotic weed instance segmentation platform, achieving the precise identification of weed. Zou et al. [26] introduced an augmentation technique based on synthesizing pretrained samples, which reduces the need for manual labeling, a common challenge in field weed segmentation tasks.
The aforementioned methods predominantly leverage CNNs, taking advantage of their strong representational power to capture local spatial features effectively through convolutional kernels. However, CNN-based networks face limitations in utilizing contextual information, especially in scenarios where target objects are spatially dispersed, as often encountered in weed mapping tasks. This constraint arises because CNNs, by design, focus on localized receptive fields, which can hinder the network’s ability to grasp long-range dependencies and holistic spatial relationships within the image. In contrast, Transformer-based architectures [27] can effectively leverage attention mechanisms to capture global semantic information, establishing long-range dependencies between objects. This capability makes them particularly suitable for handling field crop/weed images where target distributions are sparse.
Beyond using RGB imagery, a few studies indicate that near-infrared (NIR) imagery can reveal differences in thermal reflectance caused by variations in plant physiology—differences that are often undetectable in RGB images [28,29]. Wang et al. [28] employed an image fusion algorithm based on the second-generation Curvelet transform [29] to merge RGB and infrared images, effectively preserving directional features. This approach is especially advantageous for distinguishing weeds by their geometric shapes and represents one of the earliest attempts to incorporate infrared imaging into weed segmentation. Fawakherji et al. [30] applied Generative Adversarial Networks (GANs) to create four-channel images incorporating an infrared channel, significantly enhancing vegetation detection accuracy, which reduced the dependency on manually annotated weed masks for agricultural robotics, with the segmentation performance of synthetic images even surpassing that of real images. Xu et al. [31] developed a weed mapping method using multi-source remote sensing data captured by drones and machine learning techniques. This method extracts features from RGB and multispectral images through a three-step process, leveraging diverse data sources for enhanced weed detection. These studies indicate that integrating NIR imagery holds potential to enhance the weed mapping performance by providing additional spectral information that highlights subtle physiological contrasts between crops and weeds. However, these studies primarily use simple fusion strategies (i.e., concatenation) to integrate RGB and NIR imagery, which may fail to capture the intricate complementary information between the two modalities [32]. Therefore, designing an effective fusion architecture for RGB and NIR imagery in crop and weed mapping remains a key area for further exploration.
In this paper, to address limitations in current deep learning approaches for weed mapping, we propose a Transformer-based cross-modality fusion network, CMFNet, designed to improve the fusion of RGB and near-infrared (NIR) imagery for accurate field mapping of crops and weeds. To enhance the brightness and contrast of RGB and NIR images, color space enhancement and adaptive histogram equalization techniques are employed. In addition, a Transformer-based feature extraction network is implemented to strengthen the model’s capacity to detect sparsely distributed weeds in field images. Building on this foundation, we introduce a multi-scale feature fusion module based on to exploit the complementary information in RGB and NIR imagery effectively. To optimize the interaction between these two modalities, a cross-modality feature enhancement module is proposed, facilitating a richer integration of information contained in RGB and NIR features. The features are further fused and refined through fusion calibration module, which help mitigate background interference from soil and other vegetation.

2. Materials and Methods

2.1. Dataset and Preprocessing

To validate the effectiveness of the proposed framework, two datasets were used for experimental analysis: the Sugar Beets 2016 dataset [33] and the Sunflower dataset [30]. Both datasets were collected using field robots equipped with sensors, capturing multi-modal images of crops and weeds across varied field conditions.
The Sugar Beets 2016 dataset [33] was collected over a three-month period during spring 2016 at a large sugar beet farm on the Klein-Altendorf campus of the University of Bonn, Germany, by the StachnissLab Photogrammetry and Robotics Laboratory using the BoniRob agricultural robot. During data collection, BoniRob was equipped with multiple sensors, including a JAI AD-130GE camera capable of capturing both RGB and NIR spectrum images. To minimize adverse effects from natural light scattering, BoniRob operated under artificial lighting, isolating it from ambient light. A camera mounted on the robot’s underside recorded RGB and NIR images, along with other sensor data, at a standardized resolution of 1296 × 966 pixels. The dataset contains two main categories: sugar beet seedlings and associated weeds, with weeds classified without species differentiation. The temporal scope extends from the germination of sugar beet seedlings to their rapid growth stage. Each image is paired with a segmentation mask, annotated by the data providers, as illustrated in Figure 1a. After filtering out images lacking labels or containing negligible segmentation targets, 2600 paired sets of RGB and NIR images, along with annotated masks, were selected. This curated dataset covers all stages of sugar beet growth, ensuring balanced and adequate sample representation for each growth phase. The dataset is divided into a training set of 2000 pairs and a test set of 600 pairs for experiments.
The Sunflower dataset [30] was created by the ROCOCO Laboratory at Sapienza University of Rome, Italy, in spring 2016. The dataset was recorded at a sunflower farm operated by the Agri-Food Services Agency (ASSAM) in Jesi, Ancona Province, Italy, covering the full period from crop emergence through the effective duration of chemical treatments. Image collection was conducted using a JAI AD-13 camera mounted on an agricultural robot, capturing four-channel images (RGB+NIR) with a resolution of 1296 × 964 pixels, as shown in Figure 1b. The dataset includes 500 images in total, providing RGB images, NIR images, color masks, and single-channel masks, with pixel-level annotations for crop, weed, and soil classes conducted by the authors’ team. This dataset was specifically designed for weed segmentation tasks in sunflowers, aimed at addressing selective weeding challenges in agricultural robotics. In the experiments, 400 images and 100 images are used for training and testing, respectively.

2.2. Data Augmentation

To address the issues of low brightness and weak contrast in the RGB and infrared images of the experimental dataset, data augmentation methods beyond standard geometric transformations and cropping were applied. Specifically, we used the Albumentations to perform color space transformations such as HSV enhancement [34] and Contrast-Limited Adaptive Histogram Equalization (CLAHE) [35] for image enhancement.
The HSV augmentation method enriches the color variety in RGB images and enhances the contrast in infrared images, thereby revealing more image details and improving model robustness against lighting variations and environmental noise. CLAHE is a variant of adaptive histogram equalization that prevents common issues associated with excessive contrast enhancement. The CLAHE process begins by dividing the original image into small blocks and independently calculating the grayscale histogram for each. A clip limit is set to trim histogram counts that exceed this threshold, and the excess is evenly redistributed to less-frequent counts in the histogram. Finally, CLAHE equalizes the trimmed histogram, mapping the original pixel values within each block to new values, enhancing the local contrast of the image. To address potential discontinuities at block boundaries caused by independent processing, bilinear interpolation is applied between blocks to smooth the entire image, as shown in the data augmentation example in Figure 2.

2.3. Methods

The CMFNet model adopts a dual-branch architecture, employing the Segformer [36] model as the feature extraction network for both RGB and NIR images, and the overall structure is shown in Figure 3. This design aims to simultaneously capture texture, color, and contextual semantic features from crop and weed targets unique to each modality. Segformer consists of four multi-layered Vision Transformer (ViT) encoder modules in sequence, which extract multi-scale features by using different transformer blocks. Multi-modal feature fusion is implemented for each Transformer block, where the cross-modal feature enhancement module (CMFEM) and Fusion Refinement Module (FRM) are sequentially connected to facilitate interaction between RGB and NIR modalities, enhancing the integration of complementary information across the multi-modal features.

2.3.1. Segformer Network

SegFormer is an efficient semantic segmentation model based on the Transformer architecture. Compared to convolutional neural network (CNN)-based segmentation models, Transformer-based segmentation models are more adept at handling long-range dependencies and capturing global features, which are crucial for pixel-level classification tasks like semantic segmentation. SegFormer consists of two main components: (1) a Mix Transformer (MiT) encoder, designed for learning multi-scale features across different image resolutions; (2) a lightweight MLP decoder, which fuses multi-level features to generate the final semantic segmentation mask. In this decoder, the number of feature map channels corresponds to the number of segmentation classes. This architecture allows SegFormer to effectively capture both detailed local and broad global information, enhancing its performance in complex segmentation tasks.
MiT encoder: Unlike the standard ViT encoder, which generates single-resolution feature maps, the MiT encoder in SegFormer introduces a multi-layer Transformer feature extraction module that produces CNN-like hierarchical features for a given input image (shown in Figure 4). These multi-level features provide both global semantic information and local fine-grained details, essential for complex segmentation tasks. To maintain local continuity around blocks and reduce localized artifact effects, an overlapping patch merging strategy is employed. The overlapping regions between adjacent patches expand the receptive field and preserve the local information within each patch. Each Transformer feature extraction layer includes a multi-head self-attention module, where each head encodes a query vector (Q), a key vector (K), and a value vector (V), all with identical dimensions. The output of the multi-head self-attention module is processed by a Mixed Feed Forward Network (Mix-FFN) for feature encoding, as shown in Figure 4. The Mix-FFN enhances positional awareness by adding convolution layers before the MLP layer, which mitigates the effects of zero-padding on spatial information leakage. Next, the GELU activation function is applied to introduce nonlinearity, and the features are passed through a multi-layer perceptron (MLP) for the final output. This design enables the MiT encoder to capture extensive spatial and semantic information effectively across multiple scales.
MLP decoder: The decoder consists of a simple four-layer MLP structure, eliminating the need for manual design and reducing computational complexity. It operates in four main steps: multi-level features from the MiT encoder are each passed through individual MLP layers, mapping them to a unified channel dimension. These features are then upsampled to the spatial resolution of the highest level, equivalent to one-fourth of the original input size, and concatenated along the channel dimension. A lightweight MLP processes the concatenated features, enabling the effective fusion of information across different feature levels. Finally, a linear layer maps the fused features to the number of target classes, generating the segmentation mask predictions. This streamlined approach allows the decoder to efficiently integrate and utilize multi-scale feature information for precise segmentation while maintaining simplicity and efficiency.

2.3.2. Cross-Modal Feature Enhancement Module

In multi-modal learning, data from different modalities typically carry unique attributes and information. While these multi-modal data can complement each other to enhance overall information completeness, it may also introduce noise into the fused features. To address this, inspired by [37], we propose a cross-modal feature enhancement module (CMFEM) based on cross-modal interaction. This module separately extracts spatial and channel weights from each modality, enhancing the features of the other modality. As shown in Figure 5, the CMFEM is divided into two components: the Channel Feature Enhancement Module and the Spatial Feature Enhancement Module. The calculated enhancement weights are applied in the final cross-modal interaction, optimizing the overall feature representation. This design ensures that each modality’s distinctive information is maximized while minimizing interference, thus facilitating a more robust and comprehensive multi-modal fusion.
In the Channel Feature Enhancement Module, features from the RGB and NIR branches are first concatenated, resulting in a feature map with a channel dimension of C. This feature map is then passed through both average pooling and max pooling layers, capturing a more comprehensive feature description. The pooled results are subsequently flattened and merged to form a one-dimensional feature vector with 2C channels. This vector is then input into an MLP with a single linear layer, and the output is normalized to a range of 0 to 1 by a Sigmoid activation layer. The resulting weights indicate the importance of each channel, allowing for the enhancement of channels deemed significant. The process is computed as
W R G B C , W N I R C = F split σ F mlp Y
where W R G B C and W N I R C represent the channel enhancement weights applied to NIR and RGB features, respectively, σ denotes the Sigmoid activation layer, F split   represents concatenation along the channel dimension, and F m l p represents the multi-layer perceptron.
The main purpose of the Spatial Feature Enhancement Module is to improve the representation of local spatial information. Initially, the feature maps from both modalities are concatenated into a single feature map Y, which is then processed through a 1 × 1 convolutional layer, followed by a ReLU activation layer, and then another 1 × 1 convolutional layer. This bottleneck structure effectively reduces computational costs while enhancing the model’s ability to capture complex and abstract features. Finally, the processed features are passed through a Sigmoid activation function to generate spatial enhancement weights, which modulate the feature responses in subsequent layers. The process is computed as
W R G B S , W N I R S = F split ( σ ( Conv ( ReLU ( Conv ( Y ) ) ) ) )
where W N I R S and W R G B S represent the spatial enhancement weights applied to the NIR and RGB feature maps, respectively, and Conv denotes a 1 × 1 convolutional layer.
The spatial enhancement weights and channel enhancement weights calculated through the aforementioned processes are applied to the feature maps of both modalities using element-wise multiplication. These enhanced feature maps are then combined with the original feature maps of their respective modalities via residual connections, forming the final output. Additionally, hyper-parameters λ C and λ S are introduced to control the influence of the channel and spatial enhancement weights, respectively. By default, both parameters are set to 0.5 to ensure that equal importance is assigned to both modalities. The enhanced feature maps can be expressed as follows:
R G B out = X R G B X N I R × λ C W N I R C X N I R λ S W N I R S
N I R out = X N I R X R G B × λ C W R G B C X R G B λ S W R G B S
where denotes element-wise addition, × represents channel-wise multiplication, and refers to spatial dimension multiplication.

2.3.3. Fusion Refinement Module

The attention mechanism guides the model to focus on critical parts of the image, as required by the task. The Convolutional Block Attention Module (CBAM) [38], known for its simple structure and computational efficiency, is widely used across various visual processing tasks. In this study, the Fusion Refinement Module (FRM) is designed by integrating CBAM with the enhanced features (both RGB and NIR) from the CMFEM, which improves the robustness to complex field backgrounds while emphasizing weed and crop targets (shown in Figure 6). The CBAM attention is placed after the cross-modal feature fusion module, processing the fusion feature R G B o u t and N I R o u t . In the FRM, R G B o u t and N I R o u t are first combined through element-wise addition to form a fused feature map, F u s i o n i n . This map then sequentially passes through a channel attention module and a spatial attention module. In the channel attention module, global average pooling and global max pooling are used to capture the spatial information of each channel. The pooled results are processed by an MLP with shared parameters, and the Sigmoid function then activates the output, generating a channel attention map that reflects each channel’s significance. The process is computed as
M c = C A F u s i o n i n = σ ( M L P ( A v g P o o l ( F ) ) + M L P ( M a x P o o l ( F ) ) ) ,
where σ denotes the Sigmoid function. This attention map is multiplied with F u s i o n i n to yield the channel-refined feature map F u s i o n c a :
F u s i o n c a = M c F u s i o n i n .
In the spatial attention module, the average and max values are computed along the channel dimension, capturing context features and prominent texture information. These pooled results are concatenated along the channel dimension and processed by a 1D convolution to reduce dimensionality and integrate information, producing a single-channel spatial attention map with global contextual awareness. Sigmoid function is also applied. The process is computed as
M S = S A F u s i o n c a = σ Conv F u s i o n c a m a x ; F c a a v g .
The spatial attention map M S and F u s i o n c a are element-wise multiplied to obtain the final attention refined fusion feature map F u s i o n o u t :
F u s i o n o u t = M s F u s i o n c a .
The refined fusion features from different transformer blocks are fed into MLP layers and subsequently concatenated to ensure dimension compatibility.

3. Experiments and Discussion

3.1. Experimental Setting

To ensure experimental fairness and data comparability, the CMFNet model and all comparative methods were tested on the following hardware configuration: Intel(R) Core(TM) i9-14900KF 3.20 GHz CPU, GeForce RTX 4090 GPU with 24 GB memory, 64 GB RAM, and 2 TB storage. The software environment comprised Windows 11 Pro, Python 3.9, and the PyTorch 2.1 framework, with NVIDIA CUDA 12.1 utilized for accelerated training. During training, all models employed a warm-up strategy: the learning rate was gradually increased linearly from a low initial value to the set starting rate over the first 10 epochs. Other hyper-parameters are set as shown Table 1.
All experiments were conducted using the Sugar Beets 2016 and Sunflower datasets. The proposed method leverages the lightweight Segformer-B0 pretrained weights to initialize training, which accelerates convergence. For comparison, all CNN-based methods used ResNet-101 pretrained weights for initialization, ensuring a consistent baseline across models.

3.2. Evaluation Metric

This experiment uses commonly applied semantic segmentation metrics to quantitatively assess the performance of the proposed method. The primary metrics are Mean Intersection over Union (mIoU) and Mean Accuracy (mAcc). The mIoU is calculated by measuring the ratio of the Intersection over Union between the predicted and true values for each class, and then averaging these ratios across all classes, while the mAcc computes the ratio of correctly classified pixels for each class, averaging these ratios to yield an overall accuracy metric. The metrics of mIoU and mAcc are computed as
mIoU = 1 N i = 1 N T P i T P i + F P i + F N i
mAcc = 1 N i = 1 N T P i + T N i T P i + F P i + T N i + F N i
where TP represents the number of samples correctly predicted as positive, FP represents the number of samples incorrectly predicted as positive, TN denotes the count of samples accurately predicted as negative, FN indicates the number of samples incorrectly predicted as negative, and N denotes the total number of classes. In this experiment, a confusion matrix is constructed for three classes: crops (sugar beet and sunflower), weeds, and background. This matrix facilitates a comparison between predictions and ground truth on the test set, enabling the calculation and statistical analysis of the performance across all the classes using the aforementioned metrics.

3.3. Comparative Experiment

To validate the effectiveness of the proposed CMFNet method for crop and weed recognition in agricultural fields, this chapter presents quantitative comparative experiments against several models. These include five popular CNN-based semantic segmentation models: DeepLab V3+ [39], PSANet [40], GCNet [41], ISANet [42], and OCRNet [43], all of which use RGB images as input. In addition, the Transformer-based semantic segmentation model Segformer is tested under various input conditions, including single-modality RGB input, single-modality NIR input, and synthetic images generated using the ADF [44] fusion method. For fairness in experimentation, all comparative models are configured with identical parameter settings.
Table 2 presents the performance of each comparative model on the test dataset. The proposed dual-branch weed segmentation model, CMFNet, demonstrated the best overall results, achieving the highest segmentation metrics across both test sets. Specifically, CMFNet reached an mIoU of 90.86% and an mAcc of 93.8% on the sugar beet test set, and an mIoU of 90.77% and an mAcc of 94.35% on the sunflower test set. This performance represents a substantial improvement over the baseline Segformer model, which used only RGB images, validating the effectiveness of multi-modal imagery in weed segmentation tasks. It can be seen that traditional CNNs, such as DeepLab V3+, underperform compared to the baseline Segformer on the test data. This highlights the advantage of Transformer models in sparse distribution scenarios like field weed recognition, where they excel in handling global information and long-range dependencies. While CNNs enhance the learning capacity by increasing the network depth, this approach also significantly increases the model weights, creating challenges for training and deployment. The figure further reveals that the Segformer models using only NIR inputs show a diminished performance due to the absence of color and detail provided by RGB imagery. Although the Segformer model performed slightly better on the sunflower test set, it experienced a performance decline on the sugar beet test set, indicating the limitations of image fusion algorithms in consistently adapting to downstream task requirements across varied test conditions. This insight underscores the importance of optimizing fusion strategies for robust multi-modal application across different field environments.
The quantitative comparisons and visual results for the weed category are shown in Figure 7, Figure 8 and Figure 9. These figures illustrate that, despite incorporating attention mechanisms, CNN-based models like PSANet and ISANet fall short in weed recognition performance, lagging notably behind DeepLab V3+, which integrates the ASPP module. In contrast, the baseline Segformer model, leveraging its efficient Transformer-based attention mechanism, achieved an IoU of 68.59% and an accuracy (Acc) of 79.12% for the weed category on the sugar beet test set, outperforming other convolutional models. However, the performance gap between Segformer and other models narrowed on the sunflower dataset. Although the model’s performance improved with synthetic image inputs, its weed recognition results remained inconsistent. The proposed CMFNet model achieved an IoU over 76% and an Acc above 83% for the weed category on both test sets, outperforming all other models by at least five percentage points. This demonstrates CMFNet’s effectiveness in achieving refined segmentation for the weed category, highlighting its robustness and precision in complex agricultural environments.
Figure 9 shows the test results of the proposed CMFNet model, along with all comparative models and methods, on the Sugar Beets 2016 and Sunflower datasets. These results cover various growth dates and lighting conditions, offering a representative overview of the model performance. The visual results reveal that CMFNet effectively focuses on areas containing crop seedlings and weeds, producing segmentation results closely aligned with ground truth labels. Notably, CMFNet demonstrates a superior ability in the sunflower test set by clearly delineating details such as weed stems and leaf contours. Additionally, CMFNet rarely misclassifies crops as weeds, a common issue observed in other models. For example, CNN-based models like OCRNet often display overly rigid geometric shapes along crop and weed boundaries rather than natural, curved details, sometimes misidentifying crop regions as weeds. This effect may stem from CNNs’ tendency to focus on local spatial correlations, making them susceptible to neighboring spatial interference. In comparison, the Segformer model, benefiting from a broader receptive field enabled by its self-attention mechanism, shows fewer misclassifications between weeds and the background. However, it struggles with dense weed regions, producing fragmented segmentation results. While using synthetic images generated through CNN-based image fusion algorithms improves Segformer’s ability to identify weed stems and outlines, its segmentation completeness remains inferior to CMFNet’s. This discrepancy may be due to the limited interaction of modality-specific features in image-level fusion, with cross-modality noise affecting the results. These findings underscore that synthetic images as model input still have inherent limitations, reinforcing the advantage of cross-modal feature fusion as implemented in CMFNet for more effective segmentation in agricultural applications.

3.4. Ablation Study

To validate the effectiveness of the proposed CMFNet method for crop and weed recognition in two datasets, we present quantitative comparative experiments against several models, as shown in Table 3 and Table 4. These include five popular CNN-based semantic segmentation models: DeepLab V3+, PSANet, GCNet, ISANet, and OCRNet, all of which use RGB images as input. In addition, the Transformer-based semantic segmentation model Segformer is tested under various input conditions, including single-modality RGB input, single-modality NIR input, and synthetic images generated using the ADF fusion method and CNN fusion within the VIFB framework. For fairness in experimentation, all comparative models are configured with identical parameter settings.
The ablation study shows that the modules of CMFEM and FRM play a crucial role in enhancing the complementarity and adaptability of features from different modalities. Compared to the direct convolutional fusion of RGB and NIR features, the CMFEM improves the model’s mIoU by 1.7% on the Sugar Beet dataset and by an even greater 4.42% on the Sunflower dataset. This notable difference is primarily due to the diverse lighting conditions in the Sunflower dataset; the CMFEM effectively adjusts RGB and NIR features to suppress noise interference and improve cross-modal feature interaction. However, this also slightly increases the model’s parameter count. On the other hand, the CBAM attention-based FRM achieves similar performance gains with minimal parameter growth. By explicitly modeling inter-channel relationships and key spatial regions, CBAM improves the network’s ability to focus on essential features and suppress irrelevant information in complex images. This makes the network more robust in handling diverse agricultural imagery. When combined, the CMFEM and FRM deliver the highest performance improvements, achieving an mIoU exceeding 90% and mAcc above 93%, while keeping model weight and parameter growth minimal. This balance of performance and resource efficiency highlights CMFNet’s capability in handling field weed segmentation tasks in complex agricultural environments through effective cross-modal feature fusion.
In this study, the widely used multi-class cross-entropy loss function is selected as the primary loss function for the CMFNet model in the semantic segmentation task. Given the overrepresentation of foreground classes in the dataset, which leads to imbalanced class distribution, an auxiliary Dice Loss function is introduced to address this issue. An ablation study was conducted to determine the optimal weight for the auxiliary loss ratio. As shown in Table 4, when the auxiliary loss ratio λ is set to 0.2, the model achieves the best performance across the test sets, with improvements observed compared to configurations without auxiliary loss. However, when the auxiliary loss ratio exceeds this optimal value, the model’s performance begins to decline, even resulting in negative optimization effects. Although fine-tuning the auxiliary loss ratio requires careful adjustment, combining cross-entropy loss with Dice Loss effectively enhances the model performance, improving the segmentation accuracy and robustness in handling imbalanced data distributions.

4. Conclusions

RGB images capture 2D information such as color, shape, and texture, while NIR images primarily represent thermal radiation and are less susceptible to natural environmental changes. Though they convey semantic information at different scales, these modalities can complement each other, improving segmentation effectiveness. However, due to the substantial difference between the thermal emphasis of NIR images and the high spatial resolution and detailed texture of RGB images, simply adding or overlaying NIR and RGB features is inadequate. Therefore, based on the Segformer architecture, this study proposes a dual-branch semantic segmentation model, CMFNet, utilizing multi-modal image fusion. First, a dual-path feature extraction framework was constructed, forming CMFNet’s overall architecture. To efficiently extract and integrate features from RGB and NIR images, this study introduced the cross-modal feature enhancement module and the Fusion Refinement Module for effective cross-modal feature fusion. Finally, the model uses an MLP decoder to process multi-modal, multi-scale features for prediction. The results demonstrate that CMFNet effectively leverages the complementary features of multi-modal images, significantly improving crop and weed recognition accuracy. Based on the findings of this study, the near-infrared spectrum proves effective in distinguishing between weeds and crops, while transformer-based semantic segmentation methods can efficiently and robustly identify weed areas. In addition, the proposed method can serve as an effective tool to help farmers rapidly localize weed areas without the need for manual field inspections.

Author Contributions

Conceptualization, project administration and writing—original draft preparation, X.F.; software and visualization, C.G.; methodology and validation, X.Y.; Formal analysis, supervision and writing—review and editing, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Open Project of Fujian Key Laboratory of Spatial Information Perception and Intelligent Processing (Yango University, Grant NO. FKLSIPIP1017).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in this study are openly available. Please go to the websites https://www.ipb.uni-bonn.de/data/sugarbeets2016/index.html (accessed on 16 December 2024) and http://www.diag.uniroma1.it/labrococo/fsd/sunflowerdatasets.html (accessed on 16 December 2024) for accessing Sugar beet 2016 and Sunflower datasets, respectively.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chauhan, B.S. Grand challenges in weed management. Front. Agron. 2020, 1, 3. [Google Scholar] [CrossRef]
  2. Manisankar, G.; Ghosh, P.; Malik, G.C.; Banerjee, M. Recent trends in chemical weed management: A review. J. Pharm. Innov. 2022, 11, 745–753. [Google Scholar]
  3. Monteiro, A.; Santos, S. Sustainable Approach to Weed Management: The Role of Precision Weed Management. Agronomy 2022, 12, 118. [Google Scholar] [CrossRef]
  4. Rajat, S.; Asma, F. Effect of different weed control treatments on growth and yield of wheat. Int. J. Botany Stud. 2021, 6, 538–542. [Google Scholar]
  5. Kidd, P.; Mench, M.; Álvarez-López, V.; Bert, V.; Dimitrou, I.; Friesl-Hanl, W.; Herzig, R.; Janssen, J.O.; Kolbas, A.; Müller, I.; et al. Agronomic practices for improving gentle remediation of trace element-contaminated soils. Int. J. Phytoremediat. 2015, 17, 1005–1037. [Google Scholar] [CrossRef] [PubMed]
  6. McCool, C.; Beattie, J.R.; Firn, J.; Lehnert, C.; Kulk, J.; Bawden, O.; Russell, R.; Perez, T. Efficacy of mechanical weeding tools: A study into alternative weed management strategies enabled by robotics. IEEE Robot. Autom. Lett. 2018, 3, 1184–1190. [Google Scholar] [CrossRef]
  7. Abbas, T.; Zahir, Z.A.; Naveed, M.; Kremer, R.J. Limitations of existing weed control practices necessitate development of alternative techniques based on biological approaches. Adv. Agron. 2018, 147, 239–280. [Google Scholar]
  8. Dong, S.; Chen, T.; Xi, R.; Gao, S.; Li, G.; Zhou, X.; Song, X.; Ma, Y.; Hu, C.; Yuan, X. Crop Safety and Weed Control of Foliar Application of Penoxsulam in Foxtail Millet. Plants 2024, 13, 2296. [Google Scholar] [CrossRef]
  9. Sui, R.; Thomasson, J.A.; Hanks, J.; Wooten, J. Ground-based sensing system for weed mapping in cotton. Comput. Electron. Agric. 2008, 60, 31–38. [Google Scholar] [CrossRef]
  10. Panduangnat, L.; Posom, J.; Saikaew, K.; Phuphaphud, A.; Wongpichet, S.; Chinapas, A.; Sukpancharoen, S.; Saengprachatanarug, K. Time-efficient low-resolution RGB aerial imaging for precision mapping of weed types in site-specific herbicide application. Crop Prot. 2024, 184, 106805. [Google Scholar] [CrossRef]
  11. Hlaing, S.H.; Khaing, A.S. Weed and crop segmentation and classification using area thresholding. Int. J. Eng. Res. 2014, 3, 375–380. [Google Scholar]
  12. Hiremath, S.; Tolpekin, V.A.; van der Heijden, G.; Stein, A. Segmentation of Rumex obtusifolius using Gaussian Markov random fields. Mach. Vis. Appl. 2013, 24, 845–854. [Google Scholar] [CrossRef]
  13. Guijarro, M.; Riomoros, I.; Pajares, G.; Zitinski, P. Discrete wavelets transform for improving greenness image segmentation in agricultural images. Comput. Electron. Agric. 2015, 118, 396–407. [Google Scholar] [CrossRef]
  14. Xu, Y.; Yang, Y.; Chen, X.; Liu, Y. Bibliometric analysis of global NDVI research trends from 1985 to 2021. Remote Sens. 2022, 14, 3967. [Google Scholar] [CrossRef]
  15. Le, V.N.T.; Apopei, B.; Alameh, K. Effective plant discrimination based on the combination of local binary pattern operators and multiclass support vector machine methods. Inf. Process. Agric. 2019, 6, 116–131. [Google Scholar]
  16. Guerrero, J.M.; Pajares, G.; Montalvo, M.; Romeo, J.; Guijarro, M. Support vector machines for crop/weeds identification in maize fields. Expert Syst. Appl. 2012, 39, 11149–11155. [Google Scholar] [CrossRef]
  17. Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef]
  18. Bo, W.; Liu, J.; Fan, X.; Tjahjadi, T.; Ye, Q.; Fu, L. BASNet: Burned area segmentation network for real-time detection of damage maps in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5627913. [Google Scholar] [CrossRef]
  19. Wu, X.; Fan, X.; Luo, P.; Choudhury, S.D.; Tjahjadi, T.; Hu, C. From laboratory to field: Unsupervised domain adaptation for plant disease recognition in the wild. Plant Phenomics 2023, 5, 0038. [Google Scholar] [CrossRef] [PubMed]
  20. Wang, Q.; Fan, X.; Zhuang, Z.; Tjahjadi, T.; Jin, S.; Huan, H.; Ye, Q. One to All: Toward a Unified Model for Counting Cereal Crop Heads Based on Few-Shot Learning. Plant Phenomics 2023, 6, 0271. [Google Scholar] [CrossRef]
  21. Fan, X.; Luo, P.; Mu, Y.; Zhou, R.; Tjahjadi, T.; Ren, Y. Leaf image based plant disease identification using transfer learning and feature fusion. Comput. Electron. Agric. 2022, 196, 106892. [Google Scholar] [CrossRef]
  22. Wang, P.; Tang, Y.; Luo, F.; Wang, L.; Li, C.; Niu, Q.; Li, H. Weed25: A deep learning dataset for weed identification. Front. Plant Sci. 2022, 13, 1053329. [Google Scholar] [CrossRef] [PubMed]
  23. Nong, C.; Fan, X.; Wang, J. Semi-supervised learning for weed and crop segmentation using UAV imagery. Front. Plant Sci. 2022, 13, 927368. [Google Scholar] [CrossRef]
  24. McCool, C.; Perez, T.; Upcroft, B. Mixtures of lightweight deep convolutional neural networks: Applied to agricultural robotics. IEEE Robot. Autom. Lett. 2017, 2, 1344–1351. [Google Scholar] [CrossRef]
  25. Champ, J.; Mora-Fallas, A.; Goëau, H.; Mata-Montero, E.; Bonnet, P.; Joly, A. Instance segmentation for the fine detection of crop and weed plants by precision agricultural robots. Appl. Plant Sci. 2020, 8, e11373. [Google Scholar] [CrossRef]
  26. Zou, K.; Chen, X.; Wang, Y.; Zhang, C.; Zhang, F. A modified U-Net with a specific data argumentation method for semantic segmentation of weed images in the field. Comput. Electron. Agric. 2021, 187, 106242. [Google Scholar] [CrossRef]
  27. Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.H.; Martin, R.R.; Cheng, M.M.; Hu, S.M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media. 2022, 8, 331–368. [Google Scholar] [CrossRef]
  28. Wang, J.; Du, S.P. Study on the image segmentation of field crops based on the fusion of infrared and visible-light images. In 2010 Symposium on Photonics and Optoelectronics; IEEE: New York, NY, USA, 2010; pp. 1–4. [Google Scholar]
  29. Ma, J.; Plonka, G. The curvelet transform. IEEE Signal Process. Mag. 2010, 27, 118–133. [Google Scholar] [CrossRef]
  30. Fawakherji, M.; Potena, C.; Pretto, A.; Bloisi, D.D.; Nardi, D. Multi-spectral image synthesis for crop/weed segmentation in precision farming. Robot. Auton. Syst. 2021, 146, 103861. [Google Scholar] [CrossRef]
  31. Xu, B.; Meng, R.; Chen, G.; Liang, L.; Lv, Z.; Zhou, L.; Sun, R.; Zhao, F.; Yang, W. Improved weed mapping in corn fields by combining UAV-based spectral, textural, structural, and thermal measurements. Pest Manag. Sci. 2023, 79, 2591–2602. [Google Scholar] [CrossRef] [PubMed]
  32. Huang, Y.; Du, C.; Xue, Z.; Chen, X.; Zhao, H.; Huang, L. What makes multi-modal learning better than single (provably). Adv. Neural Inf. Process. Syst. 2021, 34, 10944–10956. [Google Scholar]
  33. Bosilj, P.; Duckett, T.; Cielniak, G. Analysis of morphology-based features for classification of crop and weeds in precision agriculture. IEEE Robot. Autom. Lett. 2018, 3, 2950–2956. [Google Scholar] [CrossRef]
  34. Sural, S.; Qian, G.; Pramanik, S. Segmentation and histogram generation using the HSV color space for image retrieval. In Proceedings International Conference on Image Processing, Rochester, New York, USA, 22–25 September 2002; IEEE: New York, NY, USA, 2002; Volume 2, p. 2. [Google Scholar]
  35. Setiawan, A.W.; Mengko, T.R.; Santoso, O.S.; Suksmono, A.B. Color retinal image enhancement using CLAHE. In International Conference on ICT for Smart Society; IEEE: New York, NY, USA, 2013; pp. 1–3. [Google Scholar]
  36. Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
  37. Zhang, J.; Liu, H.; Yang, K.; Hu, X.; Liu, R.; Stiefelhagen, R. CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers. IEEE Trans. Intel. Transp. Syst. 2023, 24, 12. [Google Scholar] [CrossRef]
  38. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  39. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
  40. Zhao, H.; Zhang, Y.; Liu, S.; Shi, J.; Loy, C.C.; Lin, D.; Jia, J. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 267–283. [Google Scholar]
  41. Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
  42. Huang, L.; Yuan, Y.; Guo, J.; Zhang, C.; Chen, X.; Wang, J. Interlaced sparse self-attention for semantic segmentation. arXiv 2019, arXiv:1907.12273. [Google Scholar]
  43. Yuan, Y.; Chen, X.; Chen, X.; Wang, J. Segmentation transformer: Object-contextual representations for semantic segmentation. arXiv 2019, arXiv:1909.11065. [Google Scholar]
  44. Meher, B.; Agrawal, S.; Panda, R.; Dora, L.; Abraham, A. Visible and infrared image fusion using an efficient adaptive transition region extraction technique. Eng. Sci. Technol. Int. J. 2022, 29, 101037. [Google Scholar] [CrossRef]
Figure 1. Samples from Sugar Beets 2016 and Sunflower datasets. (a) from Sugar Beets 2016, and (b) from Sunflower dataset.
Figure 1. Samples from Sugar Beets 2016 and Sunflower datasets. (a) from Sugar Beets 2016, and (b) from Sunflower dataset.
Agriculture 14 02331 g001
Figure 2. Data augmentation using HSV and CLAHE.
Figure 2. Data augmentation using HSV and CLAHE.
Agriculture 14 02331 g002
Figure 3. Structure diagram of CMFNet.
Figure 3. Structure diagram of CMFNet.
Agriculture 14 02331 g003
Figure 4. Structure diagram of Segformer Encoder–Decoder.
Figure 4. Structure diagram of Segformer Encoder–Decoder.
Agriculture 14 02331 g004
Figure 5. Cross-modal feature fusion module.
Figure 5. Cross-modal feature fusion module.
Agriculture 14 02331 g005
Figure 6. Structure of fusion feature refinement module.
Figure 6. Structure of fusion feature refinement module.
Agriculture 14 02331 g006
Figure 7. IoU performance of the proposed CMFNet compared to other methods on both datasets.
Figure 7. IoU performance of the proposed CMFNet compared to other methods on both datasets.
Agriculture 14 02331 g007
Figure 8. Acc performance of the proposed CMFNet compared to other methods on both two datasets.
Figure 8. Acc performance of the proposed CMFNet compared to other methods on both two datasets.
Agriculture 14 02331 g008
Figure 9. Visual comparison of the proposed CMFNet compared to other segmentation methods on two datasets. (ac) represent the original RGB images, NIR images, and their corresponding ground-truth mask, respectively. (dk) denote the predicted masks using Deeplab V3+, PSANet, GCNet, ISANet, OCRNet, Segformer-RGB, Segformer-NIR, and CMFNet.
Figure 9. Visual comparison of the proposed CMFNet compared to other segmentation methods on two datasets. (ac) represent the original RGB images, NIR images, and their corresponding ground-truth mask, respectively. (dk) denote the predicted masks using Deeplab V3+, PSANet, GCNet, ISANet, OCRNet, Segformer-RGB, Segformer-NIR, and CMFNet.
Agriculture 14 02331 g009
Table 1. Experiment parameters.
Table 1. Experiment parameters.
Hyper-ParameterValue
OptimizerAdamW
Epoch200
Batch size2
Learning rate0.00006
Momentum0.9
Weight decay0.01
Table 2. Quantification results using the proposed CMFNet compared to various deep learning-based semantic segmentation methods.
Table 2. Quantification results using the proposed CMFNet compared to various deep learning-based semantic segmentation methods.
ModelsSugar Beets 2016SunflowerWeights
/MB
mIoU/%mAcc/%mIoU/%mAcc/%
DeepLab V3+85.3789.9885.8790.41501
PSANet81.0785.684.0488.52626
GCNet82.2187.2984.5189.55550
ISANet80.5385.8984.6290.22454
OCRNet85.1791.2485.9489.67490
Segformer-RGB87.2292.186.6990.9545
Segformer-NIR82.8188.0584.8789.2745
Segformer-ADF87.192.0887.2191.8445
CMFNet90.8693.890.7794.35106
Table 3. Results of different fusion modules.
Table 3. Results of different fusion modules.
Fusion MethodSugar Beets 2016SunflowerWeight
/MB
FLOPs
/M
mIoU/%mAcc/%mIoU/%mAcc/%
Conv Concat87.591.1185.5489.72918.2
CMFEM89.293.4189.9692.91009.8
FRM89.9692.988.6591.53959.2
CMFEM + FRM (CMFNet)90.8693.890.7794.3510610.8
Table 4. Performance using different loss ratios.
Table 4. Performance using different loss ratios.
Sugar Beets 2016Sunflower λ
mIoU/%mAcc/%mIoU/%mAcc/%
189.8893.1789.9493.950
290.3393.490.5994.010.1
390.8693.890.7794.350.2
490.2793.1190.6893.550.5
588.8592.7689.0393.711
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, X.; Ge, C.; Yang, X.; Wang, W. Cross-Modal Feature Fusion for Field Weed Mapping Using RGB and Near-Infrared Imagery. Agriculture 2024, 14, 2331. https://doi.org/10.3390/agriculture14122331

AMA Style

Fan X, Ge C, Yang X, Wang W. Cross-Modal Feature Fusion for Field Weed Mapping Using RGB and Near-Infrared Imagery. Agriculture. 2024; 14(12):2331. https://doi.org/10.3390/agriculture14122331

Chicago/Turabian Style

Fan, Xijian, Chunlei Ge, Xubing Yang, and Weice Wang. 2024. "Cross-Modal Feature Fusion for Field Weed Mapping Using RGB and Near-Infrared Imagery" Agriculture 14, no. 12: 2331. https://doi.org/10.3390/agriculture14122331

APA Style

Fan, X., Ge, C., Yang, X., & Wang, W. (2024). Cross-Modal Feature Fusion for Field Weed Mapping Using RGB and Near-Infrared Imagery. Agriculture, 14(12), 2331. https://doi.org/10.3390/agriculture14122331

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop