CN109145770B

CN109145770B - Automatic wheat spider counting method based on combination of multi-scale feature fusion network and positioning model

Info

Publication number: CN109145770B
Application number: CN201810863041.1A
Authority: CN
Inventors: 李�瑞; 王儒敬; 谢成军; 张洁; 陈天娇; 陈红波; 胡海瀛
Original assignee: Hefei Institutes of Physical Science of CAS
Current assignee: Hefei Institutes of Physical Science of CAS
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2022-07-15
Anticipated expiration: 2038-08-01
Also published as: CN109145770A

Abstract

The invention relates to a method for automatically counting wheat spiders based on combination of a multi-scale feature fusion network and a positioning model, which overcomes the defect of high error rate of image detection aiming at small targets compared with the prior art. The invention comprises the following steps: establishing a training sample; constructing a wheat spider detection counting model; acquiring an image to be counted; and obtaining the number of the wheat spiders. The invention realizes the direct identification and counting of the wheat spiders under the field natural environment.

Description

An automatic counting method of wheat spiders based on the combination of multi-scale feature fusion network and localization model

技术领域technical field

本发明涉及图像识别技术领域，具体来说是一种基于多尺度特征融合网络与定位模型相结合的麦蜘蛛自动计数方法。The invention relates to the technical field of image recognition, in particular to a wheat spider automatic counting method based on the combination of a multi-scale feature fusion network and a positioning model.

背景技术Background technique

小麦是我国主要的粮食作物之一，在小麦生产过程中，容易受到多种害虫危害，麦蜘蛛是其中之一,它刺吸麦叶汁液、甚至干枯，严重影响小麦的产量。害虫种群数量的检测是害虫防治的重要手段，为害虫防治决策提供了理论依据。因此，田间麦蜘蛛的识别与计数对于提高小麦产量至关重要。Wheat is one of the main food crops in my country. In the process of wheat production, it is vulnerable to a variety of pests. Wheat spider is one of them. It sucks the juice of wheat leaves and even dries up, which seriously affects the output of wheat. The detection of pest population is an important means of pest control, which provides a theoretical basis for pest control decision-making. Therefore, identification and counting of wheat spiders in the field is crucial for improving wheat yield.

随着计算机视觉技术和图像处理技术的快速发展，基于图像的害虫自动识别与计数技术在近年来已成为研究热点。虽然此方法省时省力、具有智能化等优点，但是其不能适用于田间麦蜘蛛的识别与计数。原因在于：首先，麦蜘蛛个体很小只有几毫米大小，针对这样的小目标利用传统的图像识别技术(SVM)很难检测到；其次，采集图像时，外界环境的光照不稳定、不均匀都会影响图像的质量；再者，在实际应用中，采集的图像常混有其他杂物，背景较复杂。With the rapid development of computer vision technology and image processing technology, image-based pest automatic identification and counting technology has become a research hotspot in recent years. Although this method is time-saving and labor-saving, and has the advantages of intelligence, it is not suitable for the identification and counting of field wheat spiders. The reasons are: first, the individual wheat spiders are only a few millimeters in size, and it is difficult to detect such small targets using traditional image recognition technology (SVM). It affects the quality of the image; moreover, in practical applications, the collected images are often mixed with other debris, and the background is more complicated.

因此，如何在复杂的环境下实现麦蜘蛛这类小目标的检测已经成为急需解决的技术问题。Therefore, how to realize the detection of small targets such as wheat spiders in a complex environment has become a technical problem that needs to be solved urgently.

发明内容SUMMARY OF THE INVENTION

本发明的目的是为了解决现有技术中针对小目标进行图像检测误差率高的缺陷，提供一种基于多尺度特征融合网络与定位模型相结合的麦蜘蛛自动计数方法来解决上述问题。The purpose of the present invention is to solve the defect of high error rate of image detection for small targets in the prior art, and to provide an automatic counting method of wheat spiders based on the combination of multi-scale feature fusion network and positioning model to solve the above problems.

为了实现上述目的，本发明的技术方案如下：In order to achieve the above object, technical scheme of the present invention is as follows:

一种基于多尺度特征融合网络与定位模型相结合的麦蜘蛛自动计数方法，包括以下步骤：An automatic counting method for wheat spiders based on the combination of multi-scale feature fusion network and localization model, comprising the following steps:

训练样本的建立，获取多于2000张田间自然环境下麦蜘蛛图像作为训练图像，且图像中麦蜘蛛已进行标记，得到训练样本；For the establishment of training samples, more than 2000 images of wheat spiders in the natural environment in the field are obtained as training images, and the wheat spiders in the images have been marked to obtain training samples;

构造麦蜘蛛检测计数模型；Construct the wheat spider detection and counting model;

构造定位模型；Construct a positioning model;

构造多尺度特征融合网络，对多尺度特征融合网络结构进行改造；Construct a multi-scale feature fusion network, and transform the structure of the multi-scale feature fusion network;

训练多尺度特征融合网络，根据定位模型针对训练样本定位出来的候选区域的特征进行训练，将其每层的输出结果作为预测结果；Train the multi-scale feature fusion network, train the features of the candidate regions located by the training samples according to the localization model, and use the output results of each layer as the prediction results;

待计数图像的获取，获取田间拍摄的麦蜘蛛图像，并进行预处理得到待计数图像；The acquisition of the images to be counted is to acquire the images of wheat spiders photographed in the field, and preprocess to obtain the images to be counted;

麦蜘蛛个数的获得，将待计数图像输入麦蜘蛛检测计数模型，得到图像中麦蜘蛛个数。To obtain the number of wheat spiders, input the image to be counted into the wheat spider detection and counting model to obtain the number of wheat spiders in the image.

所述的构造定位模型包括以下步骤：The described construction positioning model includes the following steps:

设定颜色空间转换模块，颜色空间转换模块用于将RGB颜色空间转换为YcbCr颜色空间，并分割为R＝{r₁,r₂,...r_n}个分割区域；Set the color space conversion module, the color space conversion module is used to convert the RGB color space to the YcbCr color space, and divide it into R={r ₁ , r ₂ ,...r _n } divided regions;

计算色彩信息相似度，使用L1范式归一化获取图像每个颜色通道的25个直方图，计算该色彩空间的相似度，其计算公式如下：Calculate the similarity of color information, use L1 normalization to obtain 25 histograms of each color channel of the image, and calculate the similarity of the color space. The calculation formula is as follows:

其中，f_color(r_i,r_j)表示分割区域r_i与r_j的色彩空间相似度；

表示第i个通道、第k个直方图向量，i＝1,2,3，k＝0,1....,25，n表示直方图个数，r_i表示分割区域R＝{r₁,r₂,...r_n}第i个区域；

表示第j个通道、第k个直方图向量，j＝1,2,3，m表示m个直方图；Among them, f _color (r _i , r _j ) represents the color space similarity _{between the segmented regions ri and r j} _;

Indicates the i-th channel and the k-th histogram vector, i=1, 2, 3, k=0, 1...., 25, n represents the number of histograms, ri _i represents the segmented area R={r ₁ ,r ₂ ,...r _n } i-th area;

Represents the jth channel and the kth histogram vector, j=1, 2, 3, m represents m histograms;

计算边缘信息相似度，对每个颜色通道的8个不同方向计算方差σ＝1的高斯微分，每个通道每个颜色获取10个的直方图使用L1范式归一化，计算边缘信息相似度，其计算公式如下：Calculate the similarity of edge information, calculate the Gaussian differential with variance σ=1 for 8 different directions of each color channel, obtain 10 histograms for each color of each channel and use L1 normalization to calculate the similarity of edge information, Its calculation formula is as follows:

其中，f_edage(r_i,r_j)表示分割区域r_i与r_j的边缘信息相似度，

表示第i个通道、第k个直方图向量，

表示第j个通道、第k个直方图向量，其中i＝1,2,3，j＝1,2,3，k＝0,1....,10；n表示直方图个数，m表示m个直方图，r_i表示分割区域R＝{r₁,r₂,...r_n}第i个区域；Among them, f _edage (r _i , r _j ) represents the edge information similarity _{between the segmented regions ri and r j} _,

represents the i-th channel, the k-th histogram vector,

Represents the jth channel and the kth histogram vector, where i=1,2,3, j=1,2,3, k=0,1....,10; n represents the number of histograms, m represents m histograms, and ri represents the _i -th region of the segmented region R={r ₁ , _r ₂ , ... rn };

计算区域大小相似度，其计算公式如下：Calculate the similarity of the size of the area, and its calculation formula is as follows:

其中f_area(r_i,r_j)表示表示分割区域r_i与r_j的区域大小相似度，area()表示区域面积；area(img)表示图片面积，r_i表示分割区域R＝{r₁,r₂,...r_n}第i个区域，r_j表示分割区域R＝{r₁,r₂,...r_n}第j个区域；where f _area (r _i , r _j ) represents the _{similarity of the size of the segmented areas ri and r j} _, area() represents the area area; area(img) represents the image area, and _ri represents the segmented area R={r ₁ ,r ₂ ,...rn } i- _th area, r _j represents the division area R={r ₁ ,r ₂ ,... rn } j- _th area;

将色彩信息相似度、边缘信息相似度、区域大小相似度融合，其计算公式如下：The color information similarity, edge information similarity, and area size similarity are fused, and the calculation formula is as follows:

f(r_i,r_j)＝w₁f_color(r_i,r_j)+w₂f_edage(r_i,r_j)+w₃f_area(r_i,r_j)，f(r _i ,r _j )=w ₁ f _color (r _i ,r _j )+w ₂ f _edage (r _i ,r _j )+w ₃ f _area (r _i ,r _j ),

其中，f(r_i,r_j)表示分割区域r_i与r_j融合后的相似度，w₁、w₂、w₃分别表示信息相似度、边缘信息相似度、区域大小相似度的权值，r_i表示分割区域R＝{r₁,r₂,...r_n}第i个区域，r_j表示分割区域R＝{r₁,r₂,...r_n}第j个区域。Among them, f(r _i , r _j ) represents the similarity of the segmentation regions ri and r _j _after fusion, w ₁ , w ₂ , and w ₃ represent the weights of information similarity, edge information similarity, and region size similarity, respectively , ri represents the _ith area of the divided area R={r ₁ , r ₂ ,...rn }, _r _j represents the _jth area of the divided area R={r ₁ , r ₂ ,... rn } .

所述的构造多尺度特征融合网络包括以下步骤：The described construction of a multi-scale feature fusion network includes the following steps:

设定n层多尺度神经网络，从最顶层开始进行反卷积操作生成反卷积层；Set up an n-layer multi-scale neural network, and perform deconvolution operations from the top layer to generate a deconvolution layer;

设定第1层的输入为训练样本、输出第1层特征图，第1层特征图作为第2层的输入、输出第2层特征图，第2层特征图作为第3层的输入，…直至第n-1层特征图作为第n层的输入；Set the input of the first layer as the training sample, output the feature map of the first layer, the feature map of the first layer as the input of the second layer, output the feature map of the second layer, the feature map of the second layer as the input of the third layer, … Until the n-1th layer feature map is used as the input of the nth layer;

将第1层特征图、第2层特征图…第n层特征图与对应的第1层反卷积层、第二层反卷积层、…第n层反卷积层通过1*1卷积核连接，生成多尺度特征融合网络。The first layer feature map, the second layer feature map...nth layer feature map and the corresponding first layer deconvolution layer, second layer deconvolution layer, ... nth layer deconvolution layer through 1*1 volume The kernel connection is accumulated to generate a multi-scale feature fusion network.

所述的训练多尺度特征融合网络包括以下步骤：The described training multi-scale feature fusion network includes the following steps:

将训练样本输入定位模型，定位模型定位出训练样本的候选区域；Input the training samples into the positioning model, and the positioning model locates the candidate regions of the training samples;

将训练样本的候选区域分别输入多尺度神经网络的第1层，多尺度神经网络的第1层输出第1层特征图；The candidate regions of the training samples are respectively input into the first layer of the multi-scale neural network, and the first layer of the multi-scale neural network outputs the first layer feature map;

第1层特征图输入多尺度神经网络的第2层，多尺度神经网络的第1层输出第2层特征图，直至第n-1层特征图输入多尺度神经网络的第n层；The feature map of the first layer is input to the second layer of the multi-scale neural network, and the first layer of the multi-scale neural network outputs the feature map of the second layer, until the feature map of the n-1 layer is input to the nth layer of the multi-scale neural network;

对第n层特征图进行反卷积操作，生成第n层反卷积层，对第n-1层进行反卷积操作，生成第n-1反卷积层，由此直至第1层反卷积层；Perform a deconvolution operation on the feature map of the nth layer, generate the nth layer convolutional layer;

第1层特征图、第2层特征图、…直至第n层特征图与第1层反卷积层、第2层反卷积层、…直至第n层反卷积层通过1*1卷积核连接；The first layer feature map, the second layer feature map, ... until the nth layer feature map and the first layer deconvolution layer, the second layer deconvolution layer, ... until the nth layer deconvolution layer through 1*1 volume Accumulation nuclear connection;

第1层特征图与第1层反卷积层经过1*1卷积核连接后提取第一层特征后，生成第一层预测结果；第2层特征图与第2层反卷积层经过1*1卷积核连接后提取第二层特征后，生成第二层预测结果；…直至第n层特征图与第n层反卷积层经过1*1卷积核连接后提取第n层特征后，生成第n层预测结果；The feature map of the first layer and the deconvolution layer of the first layer are connected by a 1*1 convolution kernel to extract the features of the first layer, and then the prediction result of the first layer is generated; the feature map of the second layer and the deconvolution layer of the second layer are passed through After the 1*1 convolution kernel is connected, the second layer features are extracted, and the second layer prediction result is generated; ... until the nth layer feature map and the nth layer deconvolution layer are connected by 1*1 convolution kernel, the nth layer is extracted. After the feature, the nth layer prediction result is generated;

对第1层预测结果、第2层预测结果、…直至第n层结果做回归处理，生成最终的预测结果，回归函数如下：Perform regression processing on the first-layer prediction results, the second-layer prediction results, ... until the n-th layer results to generate the final prediction results. The regression function is as follows:

其中，C(λ)为最终的预测结果，λ表示训练参数，n表示网络层数，y^(j)表示真实的类别，p_λ(x^(j))表示第j层预测的结果；x^(j)表示第j层的特征向量；Among them, C(λ) is the final prediction result, λ is the training parameter, n is the number of network layers, y ^(j) is the real category, p _λ (x ^(j) ) is the prediction result of the jth layer; x ^{( j)} represents the feature vector of the jth layer;

通过C(λ)得到最终得分预测类别与类别在图中所在的坐标。The final score prediction category and the coordinates of the category in the graph are obtained by C(λ).

所述麦蜘蛛个数的获得包括以下步骤：The acquisition of the number of wheat spiders comprises the following steps:

将待计数图像输入定位模型，定位模型定位出待计数图像的候选区域；Input the image to be counted into the positioning model, and the positioning model locates the candidate area of the image to be counted;

将待计数图像的候选区域输入多尺度神经网络，得到图像中麦蜘蛛的预测分类，对麦蜘蛛数量进行统计，得到图像中麦蜘蛛个数。Input the candidate area of the image to be counted into the multi-scale neural network to obtain the predicted classification of the spider in the image, and count the number of spiders to obtain the number of spiders in the image.

有益效果beneficial effect

本发明的一种基于多尺度特征融合网络与定位模型相结合的麦蜘蛛自动计数方法，与现有技术相比实现了对田间自然环境下的麦蜘蛛进行直接识别、计数。Compared with the prior art, the automatic counting method of wheat spiders based on the combination of a multi-scale feature fusion network and a positioning model of the present invention realizes the direct identification and counting of the wheat spiders in the field natural environment.

本发明通过预处理消除了光照对检测计数的影响，将复杂环境简单化；再通过定位模型方法定位出疑似麦蜘蛛的候选区域；对候选区域利用多尺度特征融合网络进行特征提取，然后通过多预测结果回归最终确定麦蜘蛛区域。候选区域的定位(确定)大大减少了特征提取时间与特征维数，增强了计数的实时性；同时，多预测结果的回归融合保证了各个尺度的麦蜘蛛都能准确的检测到，提高了自动检测计数的鲁棒性与准确度。The invention eliminates the influence of illumination on the detection count through preprocessing, and simplifies the complex environment; then locates the candidate area of the suspected wheat spider through the localization model method; uses the multi-scale feature fusion network to extract the feature of the candidate area, and then uses the multi-scale feature fusion network to extract features. The prediction result regression finally determines the wheat spider area. The positioning (determination) of the candidate area greatly reduces the feature extraction time and feature dimension, and enhances the real-time performance of counting. Robustness and accuracy of detection counts.

附图说明Description of drawings

图1为本发明的方法顺序图；Fig. 1 is the method sequence diagram of the present invention;

图2a为现有技术中采用传统SVM技术对训练样本的检测结果图；Fig. 2a is the detection result diagram that adopts traditional SVM technology to training sample in the prior art;

图2b为利用本发明方法对的检测结果图；Fig. 2b is the detection result diagram of utilizing the method of the present invention;

图3为本发明中多尺度特征融合网络结构示意图。FIG. 3 is a schematic diagram of the structure of a multi-scale feature fusion network in the present invention.

具体实施方式Detailed ways

为使对本发明的结构特征及所达成的功效有更进一步的了解与认识，用以较佳的实施例及附图配合详细的说明，说明如下：In order to have a further understanding and understanding of the structural features of the present invention and the effects achieved, the preferred embodiments and accompanying drawings are used in conjunction with detailed descriptions, and the descriptions are as follows:

如图1所示，本发明所述的一种基于多尺度特征融合网络与定位模型相结合的麦蜘蛛自动计数方法，包括以下步骤：As shown in Figure 1, a method for automatic counting of wheat spiders based on the combination of a multi-scale feature fusion network and a positioning model according to the present invention includes the following steps:

第一步，训练样本的建立。获取多于2000张田间自然环境下麦蜘蛛图像作为训练图像，且图像中麦蜘蛛已进行标记，得到训练样本。The first step is the establishment of training samples. Obtain more than 2000 images of wheat spiders in the natural environment of the field as training images, and the wheat spiders in the images have been marked to obtain training samples.

第二步，构造麦蜘蛛检测计数模型。构造定位模型和多尺度特征融合网络，利用定位模型提取训练样本的的候选区域，(定位出麦蜘蛛的候选区域)，再通过多尺度融合网络提取候选区域的特征后对其分类，如果是麦蜘蛛输出坐标值，如果不是麦蜘蛛候选区取消。The second step is to construct the wheat spider detection and counting model. Construct a localization model and a multi-scale feature fusion network, use the localization model to extract the candidate area of the training sample, (locate the candidate area of the wheat spider), and then extract the features of the candidate area through the multi-scale fusion network and classify it. Spider output coordinate value, if it is not a wheat spider candidate area, cancel.

首先，构造定位模型。为了减少特征提取时间，减少特征向量维数，增强自动计数的实时性，在此首先使用定位模型，定位出麦蜘蛛的候选区域，然后根据候选区域进行特征提取。First, a positioning model is constructed. In order to reduce the time of feature extraction, reduce the dimension of feature vector, and enhance the real-time performance of automatic counting, the localization model is used first to locate the candidate area of wheat spider, and then the feature extraction is carried out according to the candidate area.

其步骤如下：The steps are as follows:

(1)设定颜色空间转换模块，颜色空间转换模块用于将RGB颜色空间转换为YcbCr颜色空间，并分割为R＝{r₁,r₂,...r_n}个分割区域。(1) Setting a color space conversion module, the color space conversion module is used to convert the RGB color space into the YcbCr color space, and divide it into R={r ₁ , _r ₂ ,...rn } divided regions.

(2)计算色彩信息相似度。使用L1范式归一化获取图像每个颜色通道的25个直方图，计算该色彩空间的相似度，其计算公式如下：(2) Calculate the similarity of color information. Use L1 normalization to obtain 25 histograms of each color channel of the image, and calculate the similarity of the color space. The calculation formula is as follows:

其中，f_color(r_i,r_j)表示分割区域r_i与r_j的色彩空间相似度；

表示第j个通道、第k个直方图向量，j＝1,2,3，m表示m个直方图。Among them, f _color (r _i , r _j ) represents the color space similarity _{between the segmented regions ri and r j} _;

Represents the jth channel and the kth histogram vector, j=1, 2, 3, m represents m histograms.

(3)计算边缘信息相似度。对每个颜色通道的8个不同方向计算方差σ＝1的高斯微分，每个通道每个颜色获取10个的直方图使用L1范式归一化，计算边缘信息相似度，其计算公式如下：(3) Calculate edge information similarity. Calculate the Gaussian differential with variance σ=1 for 8 different directions of each color channel, obtain 10 histograms for each color channel and use the L1 norm to normalize, and calculate the edge information similarity. The calculation formula is as follows:

其中，f_edage(r_i,r_j)表示分割区域r_i与r_j的边缘信息相似度，

表示第i个通道、第k个直方图向量，

表示第j个通道、第k个直方图向量，其中i＝1,2,3，j＝1,2,3，k＝0,1....,10；n表示直方图个数，m表示m个直方图，r_i表示分割区域R＝{r₁,r₂,...r_n}第i个区域。Among them, f _edage (r _i , r _j ) represents the edge information similarity _{between the segmented regions ri and r j} _,

represents the i-th channel, the k-th histogram vector,

Represents the jth channel and the kth histogram vector, where i=1,2,3, j=1,2,3, k=0,1....,10; n represents the number of histograms, m represents m histograms, and ri represents the _ith region of the divided region R={r ₁ , _r ₂ , . . . rn }.

(4)计算区域大小相似度，其计算公式如下：(4) Calculate the similarity of the size of the area, and the calculation formula is as follows:

其中f_area(r_i,r_j)表示表示分割区域r_i与r_j的区域大小相似度，area()表示区域面积；area(img)表示图片面积，r_i表示分割区域R＝{r₁,r₂,...r_n}第i个区域，r_j表示分割区域R＝{r₁,r₂,...r_n}第j个区域。where f _area (r _i , r _j ) represents the _{similarity of the size of the segmented areas ri and r j} _, area() represents the area area; area(img) represents the image area, and _ri represents the segmented area R={r ₁ ,r ₂ ,...rn } i- _th region, r _j represents the division region R={r ₁ ,r ₂ ,... rn } j- _th region.

(5)将色彩信息相似度、边缘信息相似度、区域大小相似度融合，其计算公式如下：(5) Fusion of color information similarity, edge information similarity, and area size similarity, and the calculation formula is as follows:

经过色彩信息形似度、边缘信息相似度、区域大小相似度合并，r_i与r_j不断合最终生成的n个区域，即为麦蜘蛛的候选区域。After the color information similarity, edge information similarity, and region size similarity are merged, the n regions finally generated by r _i and r _j are continuously combined, which are the candidate regions of wheat spiders.

其次，构造多尺度特征融合网络，对多尺度特征融合网络结构进行改造。为了更好的提取每个尺度、各种形态的麦蜘蛛特征，设计了多尺度特征融合网络，来准确分辨出麦蜘蛛候选区域的准确区域。Secondly, construct a multi-scale feature fusion network, and transform the structure of the multi-scale feature fusion network. In order to better extract the features of each scale and various forms of the wheat spider, a multi-scale feature fusion network is designed to accurately distinguish the exact region of the wheat spider candidate region.

如图3所示，多尺度特征融合网络结构通过利用固有的多尺度和锥形层次结构的特征图来构造具有边际的多尺度特征网络，在此开发具有侧向连接的自顶向下的架构，用于在所有尺度上构建高级语义特征图，依靠一种通过自上而下的路径和横向连接将低分辨率但语义强的特征与高分辨率语义弱的特征结合在一起，这样就可以获得高分辨率、强语义的特征，有利于麦蜘蛛这种小目标的检测。其步骤如下：As shown in Fig. 3, the multi-scale feature fusion network structure constructs a multi-scale feature network with margins by exploiting the feature maps of the inherent multi-scale and conical hierarchy, where a top-down architecture with lateral connections is developed. , for building high-level semantic feature maps at all scales, relying on a way to combine low-resolution but semantically strong features with high-resolution semantically weak features through top-down paths and lateral connections, so that Obtaining high-resolution and strong semantic features is conducive to the detection of small targets such as wheat spiders. The steps are as follows:

(1)设定n层多尺度神经网络，从最顶层开始进行反卷积操作生成反卷积层。(1) Set up an n-layer multi-scale neural network, and perform deconvolution operations from the top layer to generate deconvolution layers.

(2)设定第1层的输入为训练样本、输出第1层特征图，第1层特征图作为第2层的输入、输出第2层特征图，第2层特征图作为第3层的输入，…直至第n-1层特征图作为第n层的输入。(2) Set the input of the first layer as the training sample, output the feature map of the first layer, the feature map of the first layer as the input of the second layer, output the feature map of the second layer, and the feature map of the second layer as the feature map of the third layer Input, ... up to the n-1th layer feature map as the input of the nth layer.

(3)将第1层特征图、第2层特征图…第n层特征图与对应的第1层反卷积层、第二层反卷积层、…第n层反卷积层通过1*1卷积核连接，生成多尺度特征融合网络。(3) Pass the feature map of the first layer, the feature map of the second layer ... the feature map of the nth layer and the corresponding first layer deconvolution layer, the second layer deconvolution layer, ... the nth layer deconvolution layer through 1 *1 The convolution kernel is connected to generate a multi-scale feature fusion network.

在此通过下采样生成特征图，将每一张训练图片作为输入，采用多尺度神经网络提取特征，多尺度神经网络每层通过下采样会生成一张特征图；Here, a feature map is generated by downsampling, each training image is used as input, and a multi-scale neural network is used to extract features, and each layer of the multi-scale neural network will generate a feature map through downsampling;

然后最后一层反卷积生成上一层大小的特征图依次迭代直到生成第二层大小为止。由于多尺度网络的每层都会下采样这样会导致特征图会越来越小，而特征图片中的麦蜘蛛就会更小甚至达到几个像素大小，对于麦蜘蛛检测计数影响很大。为避免这个问题，对每层金字塔图像采用反卷积操作，通过上采样将特征图放大到上一层大小这既能有效的提取害虫特征，又保证了麦蜘蛛在图片中的大小；Then the last layer of deconvolution generates a feature map of the size of the previous layer and iterates in turn until the size of the second layer is generated. Since each layer of the multi-scale network will be downsampled, the feature map will become smaller and smaller, and the wheat spider in the feature image will be smaller or even several pixels in size, which has a great impact on the detection count of the wheat spider. In order to avoid this problem, the deconvolution operation is used for each layer of the pyramid image, and the feature map is enlarged to the size of the previous layer by upsampling, which can not only effectively extract the pest features, but also ensure the size of the wheat spider in the picture;

将反卷积生成的各层特征图通过1*1卷积核连接，生成多尺度特征融合网络。The feature maps of each layer generated by deconvolution are connected through a 1*1 convolution kernel to generate a multi-scale feature fusion network.

最后，训练多尺度特征融合网络。根据定位模型针对训练样本定位出来的候选区域作为特征进行训练，将其每层的输出结果作为预测结果。其具体步骤如下：Finally, a multi-scale feature fusion network is trained. According to the positioning model, the candidate regions located by the training samples are used as features for training, and the output results of each layer are used as the prediction results. The specific steps are as follows:

(1)将训练样本输入定位模型，定位模型定位出训练样本的候选区域。(1) Input the training samples into the positioning model, and the positioning model locates the candidate regions of the training samples.

(2)将训练样本的候选区域分别输入多尺度神经网络的第1层，多尺度神经网络的第1层输出第1层特征图。(2) The candidate regions of the training samples are respectively input into the first layer of the multi-scale neural network, and the first layer of the multi-scale neural network outputs the first layer feature map.

(3)第1层特征图输入多尺度神经网络的第2层，多尺度神经网络的第1层输出第2层特征图，直至第n-1层特征图输入多尺度神经网络的第n层。(3) The first layer feature map is input to the second layer of the multi-scale neural network, the first layer of the multi-scale neural network outputs the second layer feature map, until the n-1 layer feature map is input to the nth layer of the multi-scale neural network .

(4)对第n层特征图进行反卷积操作，生成第n层反卷积层，对第n-1层进行反卷积操作，生成第n-1反卷积层，由此直至第1层反卷积层。(4) Perform the deconvolution operation on the feature map of the nth layer to generate the nth layer of deconvolution layer, perform the deconvolution operation on the n-1th layer, and generate the n-1th deconvolution layer, thus until the th 1 deconvolution layer.

(5)第1层特征图、第2层特征图、…直至第n层特征图与第1层反卷积层、第2层反卷积层、…直至第n层反卷积层通过1*1卷积核连接。(5) The first layer feature map, the second layer feature map, ... until the nth layer feature map and the first layer deconvolution layer, the second layer deconvolution layer, ... until the nth layer deconvolution layer passes 1 *1 convolution kernel connection.

(6)第1层特征图与第1层反卷积层经过1*1卷积核连接后提取第一层特征后，生成第一层预测结果；第2层特征图与第2层反卷积层经过1*1卷积核连接后提取第二层特征后，生成第二层预测结果；…直至第n层特征图与第n层反卷积层经过1*1卷积核连接后提取第n层特征后，生成第n层预测结果。(6) After the first layer feature map and the first layer deconvolution layer are connected by a 1*1 convolution kernel, the first layer feature is extracted, and the first layer prediction result is generated; the second layer feature map and the second layer are reversed After the layer is connected by a 1*1 convolution kernel, the second layer features are extracted, and the second layer prediction result is generated; ... until the nth layer feature map and the nth layer deconvolution layer are connected by a 1*1 convolution kernel and then extracted After the nth layer features, the nth layer prediction result is generated.

(7)对第1层预测结果、第2层预测结果、…直至第n层结果做回归处理，生成最终的预测结果，回归函数如下：(7) Perform regression processing on the first-layer prediction results, the second-layer prediction results, ... until the n-th layer results, and generate the final prediction results. The regression function is as follows:

(8)通过C(λ)得到最终得分预测类别与类别在图中所在的坐标，坐标就是麦蜘蛛在图中的位置。(8) Obtain the final score prediction category and the coordinates of the category in the picture through C(λ), and the coordinates are the position of the wheat spider in the picture.

第三步，待计数图像的获取。获取田间拍摄的麦蜘蛛图像，并进行预处理得到待计数图像。The third step is to acquire the image to be counted. Acquire images of wheat spiders taken in the field, and perform preprocessing to obtain images to be counted.

第四步，麦蜘蛛个数的获得。将待计数图像输入麦蜘蛛检测计数模型，得到图像中麦蜘蛛个数。其具体步骤如下：The fourth step is to obtain the number of wheat spiders. Input the image to be counted into the wheat spider detection and counting model to obtain the number of wheat spiders in the image. The specific steps are as follows:

(1)将待计数图像输入定位模型，定位模型定位出待计数图像的候选区域；(1) Input the image to be counted into the positioning model, and the positioning model locates the candidate region of the image to be counted;

(2)将待计数图像的候选区域输入多尺度神经网络，得到图像中麦蜘蛛的预测分类，对麦蜘蛛数量进行统计，，得到图像中麦蜘蛛个数。(2) Input the candidate region of the image to be counted into the multi-scale neural network to obtain the predicted classification of the spider in the image, and count the number of spiders to obtain the number of spiders in the image.

如图2a所示，其为利用SVM算法得出的麦蜘蛛检测结果图。从图2a中可以看到，其小框检测出的麦蜘蛛区域非常大，特别是在图2a中部大框范围，其错误地将多个相对集中的麦蜘蛛全部归入一个大框的范围内。产生这个错误标示的原因在于：传统的SVM算法并未进行先期定位，若采用定位模型先定位出候选区域，则可以避免此类现象；图2a中，产生小框检测出的麦蜘蛛区域非常大的原因在于：传统的SVM算法并未采用多个预测结果的回归融合，并且图2a中，还有部分小框出现误标识。As shown in Figure 2a, it is the result of wheat spider detection obtained by using the SVM algorithm. As can be seen from Figure 2a, the small box detects a very large area of the spider, especially in the large box in the middle of Figure 2a, which mistakenly includes multiple relatively concentrated spiders into one large box. . The reason for this mislabeling is that the traditional SVM algorithm does not perform pre-positioning. If the positioning model is used to locate the candidate area first, this phenomenon can be avoided; in Figure 2a, the small-frame detected wheat spider area is very large. The reason is that the traditional SVM algorithm does not use the regression fusion of multiple prediction results, and in Figure 2a, there are still some small boxes that are misidentified.

如图2b所示，相对于传统的SVM算法而言，本发明能够精准地定位出麦蜘蛛个数和具体位置，具有较高的鲁棒性与准确度。As shown in Fig. 2b, compared with the traditional SVM algorithm, the present invention can accurately locate the number and specific positions of the spiders, and has higher robustness and accuracy.

以上显示和描述了本发明的基本原理、主要特征和本发明的优点。本行业的技术人员应该了解，本发明不受上述实施例的限制，上述实施例和说明书中描述的只是本发明的原理，在不脱离本发明精神和范围的前提下本发明还会有各种变化和改进，这些变化和改进都落入要求保护的本发明的范围内。本发明要求的保护范围由所附的权利要求书及其等同物界定。The foregoing has shown and described the basic principles, main features and advantages of the present invention. It should be understood by those skilled in the art that the present invention is not limited by the above-mentioned embodiments. The above-mentioned embodiments and descriptions describe only the principles of the present invention. Without departing from the spirit and scope of the present invention, there are various Variations and improvements are intended to fall within the scope of the claimed invention. The scope of protection claimed by the present invention is defined by the appended claims and their equivalents.

Claims

1. A wheat spider automatic counting method based on combination of a multi-scale feature fusion network and a positioning model is characterized by comprising the following steps:

11) establishing a training sample, namely acquiring more than 2000 images of the wheat spiders in the field natural environment as training images, and marking the wheat spiders in the images to obtain the training sample;

12) constructing a wheat spider detection counting model;

121) constructing a positioning model; the construction positioning model comprises the following steps:

1211) Setting a color space conversion module, wherein the color space conversion module is used for converting the RGB color space into the YcbCr color space and dividing the RGB color space into R (R) and { R ═ R₁,r₂,...r_n} segmented areas;

1212) calculating the similarity of color information, obtaining 25 histograms of each color channel of the image by normalization using an L1 paradigm, and calculating the similarity of color space according to the following calculation formula:

wherein, f_color(r_i,r_j) Represents a divided region r_iAnd r_jThe degree of similarity in the color space of (c),

denotes the a channel, k histogram vector, a is 1,2,3, k is 0,1_iDenotes a divided region R ═ { R ═ R₁,r₂,...r_nThe ith area;

represents the b-th channel, the k-th histogram vector, b ═ 1,2, 3;

1213) calculating the similarity of the edge information, calculating the Gaussian differential with the variance sigma being 1 for 8 different directions of each color channel, acquiring 10 histograms for each color of each channel, and normalizing by using an L1 paradigm to calculate the similarity of the edge information, wherein the calculation formula is as follows:

wherein f is_edage(r_i,r_j) Indicates a divided region r_iAnd r_jThe degree of similarity of the edge information of (2),

representing the e-th channel, the k-th histogram vector,

represents the f-th channel, the k-th histogram vector, where e is 1,2,3, f is 1,2,3, k is 0,1.., 10; q represents the number of histograms, r _iDenotes a divided region R ═ R { (R)₁,r₂,...r_nThe ith area;

1214) calculating the similarity of the sizes of the regions, wherein the calculation formula is as follows:

wherein f is_area(r_i,r_j) Represents a divided region r_iAnd r_jArea () represents an area of the region; area (img) represents the area of the picture, r_iDenotes a divided region R ═ { R ═ R₁,r₂,...r_nI-th area, r_jDenotes a divided region R ═ { R ═ R₁,r₂,...r_nJth area;

1215) and fusing the color information similarity, the edge information similarity and the region size similarity, wherein the calculation formula is as follows:

f(r_i,r_j)＝w₁f_color(r_i,r_j)+w₂f_edage(r_i,r_j)+w₃f_area(r_i,r_j)，

wherein, f (r)_i,r_j) Represents a scoreCutting region r_iAnd r_jSimilarity after fusion, w₁、w₂、w₃Weights r respectively representing information similarity, edge information similarity, and region size similarity_iDenotes a divided region R ═ { R ═ R₁,r₂,...r_nI-th area, r_jDenotes a divided region R ═ { R ═ R₁,r₂,...r_nJth area;

122) constructing a multi-scale feature fusion network, and transforming a multi-scale feature fusion network structure; the construction of the multi-scale feature fusion network comprises the following steps:

1221) setting n layers of multi-scale neural networks, and performing deconvolution operation from the topmost layer to generate a deconvolution layer;

1222) setting the input of the layer 1 as a training sample, outputting a layer 1 feature map, taking the layer 1 feature map as the input of the layer 2 and outputting a layer 2 feature map, taking the layer 2 feature map as the input of the layer 3, and taking … to the layer n-1 feature map as the input of the layer n;

1223) Connecting the feature map of the layer 1 and the feature map of the layer 2 … of the nth layer with the corresponding deconvolution layer of the layer 1, the deconvolution layer of the second layer and the deconvolution layer of the layer … of the nth layer through 1-1 convolution kernels to generate a multi-scale feature fusion network;

123) training a multi-scale feature fusion network, training by taking a candidate region positioned by a positioning model aiming at a training sample as a feature, and taking an output result of each layer as a prediction result;

the training multi-scale feature fusion network comprises the following steps:

1231) inputting the training sample into a positioning model, and positioning a candidate region of the training sample by the positioning model;

1232) respectively inputting the candidate regions of the training sample into the 1 st layer of the multi-scale neural network, and outputting a 1 st layer characteristic diagram by the 1 st layer of the multi-scale neural network;

1233) inputting the feature map of the layer 1 into the layer 2 of the multi-scale neural network, and outputting the feature map of the layer 2 by the layer 2 of the multi-scale neural network until the feature map of the layer n-1 is input into the nth layer of the multi-scale neural network;

1234) performing deconvolution operation on the n-th layer of feature map to generate an n-th deconvolution layer, and performing deconvolution operation on the n-1-th layer to generate an n-1-th deconvolution layer, so as to obtain a 1-st deconvolution layer;

1235) The characteristic diagrams of the 1 st layer, the 2 nd layer, … to the nth layer are connected with the deconvolution layer of the 1 st layer, the deconvolution layer of the 2 nd layer, … to the nth layer by 1 × 1 convolution kernels;

1236) connecting the feature map of the 1 st layer with the deconvolution layer of the 1 st layer through a 1 x 1 convolution kernel, extracting features of the first layer, and generating a prediction result of the first layer; connecting the feature map of the 2 nd layer with the deconvolution layer of the 2 nd layer through a 1 x 1 convolution kernel, extracting features of a second layer, and generating a prediction result of the second layer; …, generating an nth layer of prediction result until the nth layer of feature graph is connected with the nth layer of deconvolution layer through a 1 x 1 convolution kernel and then the nth layer of feature is extracted;

1237) and performing regression processing on the prediction result of the 1 st layer, the prediction result of the 2 nd layer, … till the result of the nth layer to generate a final prediction result, wherein the regression function is as follows:

wherein C (lambda) is the final prediction result, lambda represents the training parameter, n represents the number of network layers, y represents^(j)Representing true classes, p_λ(x^(j)) Representing the result of the prediction of the j-th layer; x is the number of^(j)A feature vector representing a j-th layer;

1238) obtaining a final score through C (lambda) to predict the coordinates of the category and the position of the category in the graph;

13) acquiring an image to be counted, acquiring a wheat spider image shot in the field, and preprocessing the image to be counted;

14) And (3) obtaining the number of the wheat spiders, namely inputting the image to be counted into a wheat spider detection counting model to obtain the number of the wheat spiders in the image.

2. The method for automatically counting the number of the wheat spiders based on the combination of the multi-scale feature fusion network and the positioning model as claimed in claim 1, wherein the obtaining of the number of the wheat spiders comprises the following steps:

21) inputting the image to be counted into a positioning model, and positioning a candidate area of the image to be counted by the positioning model;

22) inputting the candidate area of the image to be counted into a multi-scale neural network to obtain the prediction classification of the wheat spiders in the image, and counting the number of the wheat spiders to obtain the number of the wheat spiders in the image.