CN114187590A

CN114187590A - Method and system for identifying target fruits under homochromatic system background

Info

Publication number: CN114187590A
Application number: CN202111228465.9A
Authority: CN
Inventors: 贾伟宽; 孟虎; 卢宇琪; 贾艺鸣; 牛屹
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2022-03-15

Abstract

The invention provides a method and a system for identifying target fruits under the background of the same color system, which belong to the technical field of computer vision and are used for acquiring an orchard environment image to be identified; processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; when the orchard environment image to be recognized is processed by utilizing the recognition model trained in advance, space position supplementary codes are added to the extracted image characteristics, and loss information is supplemented. According to the fruit picking robot, a Sparse-transformer encoder-decoder model is used, so that the problems that the fruit detection efficiency of a visual system of the fruit picking robot is poor and small targets are insensitive are solved; the precision is high, the speed is high, and the agricultural requirements of fruit picking robots, yield prediction and the like are better met; the small target enhancement technology is used for expanding the sample space, so that the method is well suitable for small sample data sets and has strong generalization capability.

Description

Method and system for identifying target fruits under homochromatic system background

Technical Field

The invention relates to the technical field of computer vision, in particular to a method and a system for identifying target fruits in a homochromatic background based on Sparse-transformer small target sensitivity.

Background

In agricultural production, machine vision is widely applied to the fields of fruit and vegetable yield prediction, automatic picking, pest and disease identification and the like, and the precision and the efficiency of target detection become keys for restricting the performance of operation equipment. Currently, detection of static target fruit, dynamic target fruit, occluded or overlapping target fruit has enjoyed success.

Most of the existing detection models are based on the traditional machine learning and emerging deep network models. The detection method based on machine learning mainly depends on the characteristics of target fruits, such as color, shape and the like, and the detection effect is better for the target with larger difference with the background, but when the green target fruits are encountered, the color of the fruits is similar to that of the background, and the detection effect is relatively poor. In the detection method based on deep learning, the training target network excessively depends on the number of samples, and in the actual orchard environment, some orchards are difficult to obtain enough samples and cannot be trained to obtain an accurate detection model. Under the complex orchard environment, the posture of target fruits is changed, some target fruits are green, and the quantity of samples is insufficient due to the difficulty in acquiring partial environmental data, and the factors all bring great challenges to the accurate detection of the target.

The identification method based on machine learning usually accompanies operations such as preprocessing, feature selection and the like, an end-to-end detection process cannot be realized, and the identification effect is easily influenced by various interferences in the natural environment. Although the recognition method based on deep learning has the advantages that the precision is obviously improved, and the end-to-end detection process can be realized, the operation such as convolution and the dependence of a model on an anchor frame cause that a large amount of calculation and storage resources are consumed, and the recognition speed can not meet the real-time requirement.

Disclosure of Invention

The invention aims to provide a target fruit identification method and a target fruit identification system under the homochromatic background, which utilize the small target sensitivity and the parallel computing characteristic of a spark-transformer on the premise of ensuring the precision, improve the speed, reduce the training time, optimize the small target detection precision and speed, and better adapt to agricultural requirements such as fruit picking robots and yield prediction, and the like, so as to solve at least one technical problem in the background technology.

In order to achieve the purpose, the invention adopts the following technical scheme:

in one aspect, the invention provides a method for identifying target fruits in the background of the same color system, which comprises the following steps:

acquiring an orchard environment image to be identified;

processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;

wherein,

and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.

Preferably, training the recognition model comprises: processing the training set by a deep convolution neural network, extracting characteristics, constructing sparse transformer model processing characteristics, processing by a feedforward neural network, and outputting a final detection result; inputting a test sample, evaluating the obtained detection result by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained.

Preferably, a single lens reflex is used for collecting green target fruit images under different illumination, different time periods and different angles; copying target fruits smaller than preset pixels in the image by using a small target enhancement technology to expand the sample, carrying out classification and labeling and constructing a data set; and dividing the expanded data set into a training set, a verification set and a test set.

Preferably, the encoder of the constructed sparse transformer model comprises: replacing an attention module for processing feature mapping in a Transformer mechanism with a hole self-attention module; and processing and reducing the dimension of the image characteristics, adding a space position complement code, supplementing loss information, inputting a cavity self-attention mechanism, a residual error module and a regularization layer, processing the image characteristics, and outputting an encoder result through a feedforward neural network, the residual error module and the regularization layer.

Preferably, the decoder of the constructed sparse transformer model comprises: and inputting the parameters learned by the encoder into a cavity self-attention mechanism, a residual error module and a regularization layer, processing the parameters, inputting the processed results into a multi-head self-attention mechanism, a residual error module and a regularization layer, and processing the processed results through a feedforward neural network, the residual error module and the regularization layer to obtain detection results.

Preferably, the feedforward neural network computes the result by a multi-layered perceptron with a ReLU activation function and hidden dimensions, and a linear projection layer.

Preferably, a final loss function is constructed by using the Hungarian loss function and the SoftMax loss function, a network model is optimized, and model training is carried out.

In a second aspect, the present invention provides a system for identifying a target fruit in a same color family background, comprising:

the acquiring module is used for acquiring an orchard environment image to be identified;

the recognition module is used for processing the orchard environment image to be recognized by utilizing a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;

wherein,

In a third aspect, the present invention provides a non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement the method for identifying a target fruit in a homochromatic context as described above.

In a fourth aspect, the present invention provides an electronic device comprising: a processor, a memory, and a computer program; wherein the processor is connected with the memory, the computer program is stored in the memory, and when the electronic device runs, the processor executes the computer program stored in the memory, so as to make the electronic device execute the instruction for realizing the target fruit identification method in the same color system background.

The invention has the beneficial effects that: the method solves the problems of poor fruit detection efficiency and insensitivity of small targets of a visual system of a fruit picking robot by using a Sparse-transformer encoder-decoder model; the precision is high, the speed is high, and the agricultural requirements of fruit picking robots, yield prediction and the like are better met; the small target enhancement technology is used for expanding the sample space, so that the method is well suitable for small sample data sets and has strong generalization capability.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart illustrating training of a recognition model in a target fruit recognition method in the same color system background according to an embodiment of the present invention.

Fig. 2 is a structural diagram of a Sparse-transformer encoder of the Sparse transformer model according to the embodiment of the present invention.

Fig. 3 is a structural diagram of a Sparse-transformer decoder according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of the effect of the feedforward neural network FNN according to the embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by way of the drawings are illustrative only and are not to be construed as limiting the invention.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

For the purpose of facilitating an understanding of the present invention, the present invention will be further explained by way of specific embodiments with reference to the accompanying drawings, which are not intended to limit the present invention.

It should be understood by those skilled in the art that the drawings are merely schematic representations of embodiments and that the elements shown in the drawings are not necessarily required to practice the invention.

Example 1

This embodiment 1 provides a target fruit identification system under the background of the same color system, which includes:

wherein,

In this embodiment 1, the method for identifying a target fruit in a homochromatic background is implemented by using the above system for identifying a target fruit in a homochromatic background, and includes:

using an acquisition module to acquire an orchard environment image to be identified; if the Canon single-lens reflex camera can be used for acquiring an orchard environment image to be identified.

Processing the orchard environment image to be recognized by using a recognition module and a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images. When the orchard environment image to be recognized is processed by utilizing the recognition model trained in advance, space position supplementary codes are added to the extracted image characteristics, and loss information is supplemented.

In this embodiment 1, training the recognition model includes: processing the training set by a deep convolution neural network, extracting characteristics, constructing sparse transformer model processing characteristics, processing by a feedforward neural network, and outputting a final detection result; inputting a test sample, evaluating the obtained detection result by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained.

Making a data set of a training model includes: collecting green target fruit images under different illumination, different time periods and different angles by using a single lens reflex; copying target fruits smaller than preset pixels in the image by using a small target enhancement technology to expand the sample, carrying out classification and labeling and constructing a data set; and dividing the expanded data set into a training set, a verification set and a test set.

The encoder of the constructed sparse transformer model comprises the following steps: replacing an attention module for processing feature mapping in a Transformer mechanism with a hole self-attention module; and processing and reducing the dimension of the image characteristics, adding a space position complement code, supplementing loss information, inputting a cavity self-attention mechanism, a residual error module and a regularization layer, processing the image characteristics, and outputting an encoder result through a feedforward neural network, the residual error module and the regularization layer.

The decoder of the constructed sparse transformer model comprises: and inputting the parameters learned by the encoder into a cavity self-attention mechanism, a residual error module and a regularization layer, processing the parameters, inputting the processed results into a multi-head self-attention mechanism, a residual error module and a regularization layer, and processing the processed results through a feedforward neural network, the residual error module and the regularization layer to obtain detection results.

The feed-forward neural network computes results through a multi-layered perceptron with a ReLU activation function and hidden dimensions, and a linear projection layer. And constructing a final loss function by using the Hungarian loss function and the SoftMax loss function, optimizing a network model, and training the model.

Example 2

In this embodiment 1, a method for identifying a target fruit in a same color system background is provided, which includes:

acquiring an orchard environment image to be identified;

processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images. When the orchard environment image to be recognized is processed by utilizing the recognition model trained in advance, space position supplementary codes are added to the extracted image characteristics, and loss information is supplemented.

In this embodiment 2, training the recognition model includes: processing the training set by a deep convolution neural network, extracting characteristics, constructing sparse transformer model processing characteristics, processing by a feedforward neural network, and outputting a final detection result; inputting a test sample, evaluating the obtained detection result by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained.

As shown in fig. 1, specifically, firstly, an image of a green target fruit in a green environment is collected, and preprocessing and target labeling are performed to generate a data set; a small target enhancement technology is used for copying target fruits with pixels smaller than 64 multiplied by 64 in the image, preprocessing data, expanding samples and improving model precision; constructing a Sparse-transformer encoder-decoder network model and constructing a feedforward neural network prediction final result; constructing a loss function, optimizing a result, finally inputting a test sample, evaluating the obtained detection result of the green target fruit detection model by using the evaluation index, and adjusting the parameters of the model according to the evaluation structure; and finally, repeatedly training the improved model until the optimal network model is obtained.

Wherein the creating of the data set of the training model comprises: collecting green target fruit images under different illumination, different time periods and different angles by using a single lens reflex; copying target fruits smaller than preset pixels in the image by using a small target enhancement technology to expand the sample, carrying out classification and labeling and constructing a data set; and dividing the expanded data set into a training set, a verification set and a test set. Specifically, image acquisition and classification. The Canon EOS 80D single lens reflex is used for collecting rich green fruit images in an orchard environment, the collected images are classified, and a data set is conveniently processed. The data is preprocessed by copying target fruits smaller than 64 x 64 pixels in the image using small target enhancement techniques. And (3) labeling the image by using LabelMe software, and labeling each target fruit as an independent connected domain to manufacture a COCO format data set.

The feed-forward neural network computes results through a multi-layered perceptron with a ReLU activation function and hidden dimensions, and a linear projection layer.

Specifically, in this embodiment 2, a network header is constructed to extract features. The traditional CNN network backbone is from the initial image

Starting from (3 color channels), a low resolution activation mapping feature f e R is generated^C×H×W. In the embodiment 2, the characteristic values used are: c is 2048,

As shown in fig. 2, constructing the spare-transformer encoder includes: the hole attention module is used instead of the attention module in the transform mechanism that handles feature mapping. And (3) reducing the dimension of the image characteristics through processing, adding a space position complement code, supplementing loss information, inputting a cavity self-attention mechanism and a residual error module and a regularization layer, processing the image characteristics, and outputting an encoder result through a feedforward neural network and the residual error module and the regularization layer. As shown in figure 4, the effect is better after the treatment of the feedforward neural network FNN.

As shown in fig. 3, constructing a variant Sparse-transformer decoder includes: the spark-transformer decoder is constructed using a variety of attention mechanisms, including a multi-headed attention mechanism, a hole self-attention mechanism. Firstly, inputting parameters learned by an encoder into a cavity self-attention mechanism, a residual error module and a regularization layer, processing the parameters, inputting a processing result into a multi-head self-attention mechanism, a residual error module and a regularization layer, and processing by a feedforward neural network, the residual error module and the regularization layer to obtain a detection result.

In this embodiment 2, the model is evaluated and the network model is optimized. Inputting a test sample, evaluating the detection result of the obtained green fruit detection model by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained. The specific process is as follows:

and evaluating the model by adopting recall rate and accuracy, and providing basis for optimizing the model. And repeatedly training and model evaluating the model according to the recall rate and the accuracy until an optimized result is obtained.

In this embodiment 2, a final loss function is constructed by using the hungarian loss function and the SoftMax loss function, a network model is optimized, and model training is performed. The method comprises the following specific steps:

using y to represent the background truth set and using

Representing a prediction set, two matches between the two sets are found using the following formula:

wherein,

is true value y_iThe loss of binary match with the predicted sequence sigma (i),

the arrangement of N elements is shown, N represents a prediction set with a fixed size, and the optimization algorithm works on the basis of the Hungarian algorithm.

The Softmax function is a function which is frequently used in deep learning, can map several input numbers into real numbers between 0 and 1, and can still ensure the sum of the several numbers to be 1 after normalization. It is formulated as:

where T represents the number of elements and the ratio of the index of the element to the sum of the indices of all elements is calculated.

I.e. the loss function is:

step 4.3: will l₁Loss function and GLOU loss function

Combining the two functions on the basis of scale invariance to establish a boundary frame loss function of the user and combine the boundary frame loss function with the boundary frame loss function

Is defined as:

l₁loss function: based on comparing the differences pixel by pixel and then taking the absolute value, x (p) represents the original image pixels, y (p) represents the pixels of the image after calculation, the formula is as follows:

GLOU loss function is shown below using

Where a and B represent the generated bounding box regions:

λ_iou∈R、

is a hyper-parameter, normalized by the number of objects in the batch, L₁Is represented by₁A loss function.

In conclusion, in this embodiment 2, the invention uses a Sparse-transformer encoder-decoder model to solve the problems of poor fruit detection efficiency and insensitivity to small targets of the visual system of the fruit picking robot. The method has high precision and high speed, and better meets the agricultural requirements of fruit picking robots, yield prediction and the like. The small target enhancement technology is used for expanding the sample space, the small sample data set is well adapted, the generalization capability is strong, and the method can be applied to robot vision systems for picking or pre-producing various fruits.

Example 3

In this embodiment 3, a fruit picking robot is provided, which includes a target fruit identification system in a background of the same color system, and the system can implement a target fruit identification method in a background of the same color system, including:

acquiring an orchard environment image to be identified;

wherein,

Example 4

Embodiment 4 of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium is used to store computer instructions, and when the computer instructions are executed by a processor, the method for identifying a target fruit in a same color system background as described above is implemented, where the method includes:

acquiring an orchard environment image to be identified;

wherein,

Example 5

Embodiment 5 of the present invention provides a computer program (product) comprising a computer program, which when run on one or more processors, is configured to implement a method for identifying a target fruit in a homochromatic background as described above, the method comprising:

acquiring an orchard environment image to be identified;

wherein,

Example 6

An embodiment 6 of the present invention provides an electronic device, including: a processor, a memory, and a computer program; wherein a processor is connected to the memory, a computer program is stored in the memory, and when the electronic device runs, the processor executes the computer program stored in the memory to make the electronic device execute the instructions for implementing the target fruit identification method in the same color family background as described above, the method includes:

acquiring an orchard environment image to be identified;

wherein,

In summary, the method and the system for identifying the target fruit in the homochromatic background according to the embodiments of the present invention use the Sparse-transformer encoder-decoder model to solve the problems of poor fruit detection efficiency and insensitivity to small target in the visual system of the fruit picking robot. The identification precision is high, the speed is high, and the agricultural requirements of fruit picking robots, yield prediction and the like are well met. The small target enhancement technology is used for expanding the sample space, the small sample data set is well adapted, the generalization capability is strong, and the method can be applied to robot vision systems for picking or pre-producing various fruits.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts based on the technical solutions disclosed in the present invention.

Claims

1. A method for identifying target fruits in the same color system background is characterized by comprising the following steps:

acquiring an orchard environment image to be identified;

wherein,

2. The method of claim 1, wherein training the recognition model comprises: processing the training set by a deep convolution neural network, extracting characteristics, constructing sparse transformer model processing characteristics, processing by a feedforward neural network, and outputting a final detection result; inputting a test sample, evaluating the obtained detection result by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained.

3. The method for identifying target fruits in the same color family background as claimed in claim 2, wherein a single lens reflex is used to collect green target fruit images under different illumination, different time periods and different angles; copying target fruits smaller than preset pixels in the image by using a small target enhancement technology to expand the sample, carrying out classification and labeling and constructing a data set; and dividing the expanded data set into a training set, a verification set and a test set.

4. The method for identifying target fruits in the same color family background as claimed in claim 2, wherein the encoder of the constructed sparse transformer model comprises: replacing an attention module for processing feature mapping in a Transformer mechanism with a hole self-attention module; and processing and reducing the dimension of the image characteristics, adding a space position complement code, supplementing loss information, inputting a cavity self-attention mechanism, a residual error module and a regularization layer, processing the image characteristics, and outputting an encoder result through a feedforward neural network, the residual error module and the regularization layer.

5. The method for identifying target fruits in the same color family background as claimed in claim 4, wherein the decoder of the constructed sparse transformer model comprises: and inputting the parameters learned by the encoder into a cavity self-attention mechanism, a residual error module and a regularization layer, processing the parameters, inputting the processed results into a multi-head self-attention mechanism, a residual error module and a regularization layer, and processing the processed results through a feedforward neural network, the residual error module and the regularization layer to obtain detection results.

6. The method of claim 5, wherein the feedforward neural network computes the result by a multi-layered perceptron with the ReLU activation function and the hidden dimension, and a linear projection layer.

7. The method for identifying the target fruit under the homochromatic system background as claimed in claim 2, wherein a Hungarian loss function and a SoftMax loss function are used for constructing a final loss function, optimizing a network model and performing model training.

8. A system for identifying a target fruit in a homochromatic background, comprising:

wherein,

9. A non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement the method of identifying a target fruit in a homochromatic context of any of claims 1-6.

10. An electronic device, comprising: a processor, a memory, and a computer program; wherein a processor is connected to a memory, a computer program being stored in the memory, the processor executing the computer program stored in the memory when the electronic device is running, to cause the electronic device to execute instructions to implement the method for identifying a target fruit in a homochromatic context as claimed in any of the claims 1-6.