CN114187590A - Method and system for identifying target fruits under homochromatic system background - Google Patents
Method and system for identifying target fruits under homochromatic system background Download PDFInfo
- Publication number
- CN114187590A CN114187590A CN202111228465.9A CN202111228465A CN114187590A CN 114187590 A CN114187590 A CN 114187590A CN 202111228465 A CN202111228465 A CN 202111228465A CN 114187590 A CN114187590 A CN 114187590A
- Authority
- CN
- China
- Prior art keywords
- target
- model
- processing
- orchard environment
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 235000013399 edible fruits Nutrition 0.000 title claims abstract description 92
- 238000000034 method Methods 0.000 title claims abstract description 44
- 239000002420 orchard Substances 0.000 claims abstract description 63
- 238000012545 processing Methods 0.000 claims abstract description 53
- 238000001514 detection method Methods 0.000 claims abstract description 33
- 238000005516 engineering process Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 61
- 230000006870 function Effects 0.000 claims description 32
- 238000013528 artificial neural network Methods 0.000 claims description 25
- 230000007246 mechanism Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 18
- 238000002372 labelling Methods 0.000 claims description 18
- 230000000295 complement effect Effects 0.000 claims description 14
- 238000011156 evaluation Methods 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 10
- 239000013589 supplement Substances 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 6
- 230000011514 reflex Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 5
- 230000001502 supplementing effect Effects 0.000 claims description 5
- 238000005286 illumination Methods 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 230000000007 visual effect Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 11
- 230000000694 effects Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 241000607479 Yersinia pestis Species 0.000 description 1
- 238000012271 agricultural production Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 235000012055 fruits and vegetables Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method and a system for identifying target fruits under the background of the same color system, which belong to the technical field of computer vision and are used for acquiring an orchard environment image to be identified; processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; when the orchard environment image to be recognized is processed by utilizing the recognition model trained in advance, space position supplementary codes are added to the extracted image characteristics, and loss information is supplemented. According to the fruit picking robot, a Sparse-transformer encoder-decoder model is used, so that the problems that the fruit detection efficiency of a visual system of the fruit picking robot is poor and small targets are insensitive are solved; the precision is high, the speed is high, and the agricultural requirements of fruit picking robots, yield prediction and the like are better met; the small target enhancement technology is used for expanding the sample space, so that the method is well suitable for small sample data sets and has strong generalization capability.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a method and a system for identifying target fruits in a homochromatic background based on Sparse-transformer small target sensitivity.
Background
In agricultural production, machine vision is widely applied to the fields of fruit and vegetable yield prediction, automatic picking, pest and disease identification and the like, and the precision and the efficiency of target detection become keys for restricting the performance of operation equipment. Currently, detection of static target fruit, dynamic target fruit, occluded or overlapping target fruit has enjoyed success.
Most of the existing detection models are based on the traditional machine learning and emerging deep network models. The detection method based on machine learning mainly depends on the characteristics of target fruits, such as color, shape and the like, and the detection effect is better for the target with larger difference with the background, but when the green target fruits are encountered, the color of the fruits is similar to that of the background, and the detection effect is relatively poor. In the detection method based on deep learning, the training target network excessively depends on the number of samples, and in the actual orchard environment, some orchards are difficult to obtain enough samples and cannot be trained to obtain an accurate detection model. Under the complex orchard environment, the posture of target fruits is changed, some target fruits are green, and the quantity of samples is insufficient due to the difficulty in acquiring partial environmental data, and the factors all bring great challenges to the accurate detection of the target.
The identification method based on machine learning usually accompanies operations such as preprocessing, feature selection and the like, an end-to-end detection process cannot be realized, and the identification effect is easily influenced by various interferences in the natural environment. Although the recognition method based on deep learning has the advantages that the precision is obviously improved, and the end-to-end detection process can be realized, the operation such as convolution and the dependence of a model on an anchor frame cause that a large amount of calculation and storage resources are consumed, and the recognition speed can not meet the real-time requirement.
Disclosure of Invention
The invention aims to provide a target fruit identification method and a target fruit identification system under the homochromatic background, which utilize the small target sensitivity and the parallel computing characteristic of a spark-transformer on the premise of ensuring the precision, improve the speed, reduce the training time, optimize the small target detection precision and speed, and better adapt to agricultural requirements such as fruit picking robots and yield prediction, and the like, so as to solve at least one technical problem in the background technology.
In order to achieve the purpose, the invention adopts the following technical scheme:
in one aspect, the invention provides a method for identifying target fruits in the background of the same color system, which comprises the following steps:
acquiring an orchard environment image to be identified;
processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;
wherein,
and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.
Preferably, training the recognition model comprises: processing the training set by a deep convolution neural network, extracting characteristics, constructing sparse transformer model processing characteristics, processing by a feedforward neural network, and outputting a final detection result; inputting a test sample, evaluating the obtained detection result by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained.
Preferably, a single lens reflex is used for collecting green target fruit images under different illumination, different time periods and different angles; copying target fruits smaller than preset pixels in the image by using a small target enhancement technology to expand the sample, carrying out classification and labeling and constructing a data set; and dividing the expanded data set into a training set, a verification set and a test set.
Preferably, the encoder of the constructed sparse transformer model comprises: replacing an attention module for processing feature mapping in a Transformer mechanism with a hole self-attention module; and processing and reducing the dimension of the image characteristics, adding a space position complement code, supplementing loss information, inputting a cavity self-attention mechanism, a residual error module and a regularization layer, processing the image characteristics, and outputting an encoder result through a feedforward neural network, the residual error module and the regularization layer.
Preferably, the decoder of the constructed sparse transformer model comprises: and inputting the parameters learned by the encoder into a cavity self-attention mechanism, a residual error module and a regularization layer, processing the parameters, inputting the processed results into a multi-head self-attention mechanism, a residual error module and a regularization layer, and processing the processed results through a feedforward neural network, the residual error module and the regularization layer to obtain detection results.
Preferably, the feedforward neural network computes the result by a multi-layered perceptron with a ReLU activation function and hidden dimensions, and a linear projection layer.
Preferably, a final loss function is constructed by using the Hungarian loss function and the SoftMax loss function, a network model is optimized, and model training is carried out.
In a second aspect, the present invention provides a system for identifying a target fruit in a same color family background, comprising:
the acquiring module is used for acquiring an orchard environment image to be identified;
the recognition module is used for processing the orchard environment image to be recognized by utilizing a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;
wherein,
and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.
In a third aspect, the present invention provides a non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement the method for identifying a target fruit in a homochromatic context as described above.
In a fourth aspect, the present invention provides an electronic device comprising: a processor, a memory, and a computer program; wherein the processor is connected with the memory, the computer program is stored in the memory, and when the electronic device runs, the processor executes the computer program stored in the memory, so as to make the electronic device execute the instruction for realizing the target fruit identification method in the same color system background.
The invention has the beneficial effects that: the method solves the problems of poor fruit detection efficiency and insensitivity of small targets of a visual system of a fruit picking robot by using a Sparse-transformer encoder-decoder model; the precision is high, the speed is high, and the agricultural requirements of fruit picking robots, yield prediction and the like are better met; the small target enhancement technology is used for expanding the sample space, so that the method is well suitable for small sample data sets and has strong generalization capability.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart illustrating training of a recognition model in a target fruit recognition method in the same color system background according to an embodiment of the present invention.
Fig. 2 is a structural diagram of a Sparse-transformer encoder of the Sparse transformer model according to the embodiment of the present invention.
Fig. 3 is a structural diagram of a Sparse-transformer decoder according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of the effect of the feedforward neural network FNN according to the embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by way of the drawings are illustrative only and are not to be construed as limiting the invention.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
For the purpose of facilitating an understanding of the present invention, the present invention will be further explained by way of specific embodiments with reference to the accompanying drawings, which are not intended to limit the present invention.
It should be understood by those skilled in the art that the drawings are merely schematic representations of embodiments and that the elements shown in the drawings are not necessarily required to practice the invention.
Example 1
This embodiment 1 provides a target fruit identification system under the background of the same color system, which includes:
the acquiring module is used for acquiring an orchard environment image to be identified;
the recognition module is used for processing the orchard environment image to be recognized by utilizing a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;
wherein,
and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.
In this embodiment 1, the method for identifying a target fruit in a homochromatic background is implemented by using the above system for identifying a target fruit in a homochromatic background, and includes:
using an acquisition module to acquire an orchard environment image to be identified; if the Canon single-lens reflex camera can be used for acquiring an orchard environment image to be identified.
Processing the orchard environment image to be recognized by using a recognition module and a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images. When the orchard environment image to be recognized is processed by utilizing the recognition model trained in advance, space position supplementary codes are added to the extracted image characteristics, and loss information is supplemented.
In this embodiment 1, training the recognition model includes: processing the training set by a deep convolution neural network, extracting characteristics, constructing sparse transformer model processing characteristics, processing by a feedforward neural network, and outputting a final detection result; inputting a test sample, evaluating the obtained detection result by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained.
Making a data set of a training model includes: collecting green target fruit images under different illumination, different time periods and different angles by using a single lens reflex; copying target fruits smaller than preset pixels in the image by using a small target enhancement technology to expand the sample, carrying out classification and labeling and constructing a data set; and dividing the expanded data set into a training set, a verification set and a test set.
The encoder of the constructed sparse transformer model comprises the following steps: replacing an attention module for processing feature mapping in a Transformer mechanism with a hole self-attention module; and processing and reducing the dimension of the image characteristics, adding a space position complement code, supplementing loss information, inputting a cavity self-attention mechanism, a residual error module and a regularization layer, processing the image characteristics, and outputting an encoder result through a feedforward neural network, the residual error module and the regularization layer.
The decoder of the constructed sparse transformer model comprises: and inputting the parameters learned by the encoder into a cavity self-attention mechanism, a residual error module and a regularization layer, processing the parameters, inputting the processed results into a multi-head self-attention mechanism, a residual error module and a regularization layer, and processing the processed results through a feedforward neural network, the residual error module and the regularization layer to obtain detection results.
The feed-forward neural network computes results through a multi-layered perceptron with a ReLU activation function and hidden dimensions, and a linear projection layer. And constructing a final loss function by using the Hungarian loss function and the SoftMax loss function, optimizing a network model, and training the model.
Example 2
In this embodiment 1, a method for identifying a target fruit in a same color system background is provided, which includes:
acquiring an orchard environment image to be identified;
processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images. When the orchard environment image to be recognized is processed by utilizing the recognition model trained in advance, space position supplementary codes are added to the extracted image characteristics, and loss information is supplemented.
In this embodiment 2, training the recognition model includes: processing the training set by a deep convolution neural network, extracting characteristics, constructing sparse transformer model processing characteristics, processing by a feedforward neural network, and outputting a final detection result; inputting a test sample, evaluating the obtained detection result by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained.
As shown in fig. 1, specifically, firstly, an image of a green target fruit in a green environment is collected, and preprocessing and target labeling are performed to generate a data set; a small target enhancement technology is used for copying target fruits with pixels smaller than 64 multiplied by 64 in the image, preprocessing data, expanding samples and improving model precision; constructing a Sparse-transformer encoder-decoder network model and constructing a feedforward neural network prediction final result; constructing a loss function, optimizing a result, finally inputting a test sample, evaluating the obtained detection result of the green target fruit detection model by using the evaluation index, and adjusting the parameters of the model according to the evaluation structure; and finally, repeatedly training the improved model until the optimal network model is obtained.
Wherein the creating of the data set of the training model comprises: collecting green target fruit images under different illumination, different time periods and different angles by using a single lens reflex; copying target fruits smaller than preset pixels in the image by using a small target enhancement technology to expand the sample, carrying out classification and labeling and constructing a data set; and dividing the expanded data set into a training set, a verification set and a test set. Specifically, image acquisition and classification. The Canon EOS 80D single lens reflex is used for collecting rich green fruit images in an orchard environment, the collected images are classified, and a data set is conveniently processed. The data is preprocessed by copying target fruits smaller than 64 x 64 pixels in the image using small target enhancement techniques. And (3) labeling the image by using LabelMe software, and labeling each target fruit as an independent connected domain to manufacture a COCO format data set.
The encoder of the constructed sparse transformer model comprises the following steps: replacing an attention module for processing feature mapping in a Transformer mechanism with a hole self-attention module; and processing and reducing the dimension of the image characteristics, adding a space position complement code, supplementing loss information, inputting a cavity self-attention mechanism, a residual error module and a regularization layer, processing the image characteristics, and outputting an encoder result through a feedforward neural network, the residual error module and the regularization layer.
The decoder of the constructed sparse transformer model comprises: and inputting the parameters learned by the encoder into a cavity self-attention mechanism, a residual error module and a regularization layer, processing the parameters, inputting the processed results into a multi-head self-attention mechanism, a residual error module and a regularization layer, and processing the processed results through a feedforward neural network, the residual error module and the regularization layer to obtain detection results.
The feed-forward neural network computes results through a multi-layered perceptron with a ReLU activation function and hidden dimensions, and a linear projection layer.
Specifically, in this embodiment 2, a network header is constructed to extract features. The traditional CNN network backbone is from the initial imageStarting from (3 color channels), a low resolution activation mapping feature f e R is generatedC×H×W. In the embodiment 2, the characteristic values used are: c is 2048,
As shown in fig. 2, constructing the spare-transformer encoder includes: the hole attention module is used instead of the attention module in the transform mechanism that handles feature mapping. And (3) reducing the dimension of the image characteristics through processing, adding a space position complement code, supplementing loss information, inputting a cavity self-attention mechanism and a residual error module and a regularization layer, processing the image characteristics, and outputting an encoder result through a feedforward neural network and the residual error module and the regularization layer. As shown in figure 4, the effect is better after the treatment of the feedforward neural network FNN.
As shown in fig. 3, constructing a variant Sparse-transformer decoder includes: the spark-transformer decoder is constructed using a variety of attention mechanisms, including a multi-headed attention mechanism, a hole self-attention mechanism. Firstly, inputting parameters learned by an encoder into a cavity self-attention mechanism, a residual error module and a regularization layer, processing the parameters, inputting a processing result into a multi-head self-attention mechanism, a residual error module and a regularization layer, and processing by a feedforward neural network, the residual error module and the regularization layer to obtain a detection result.
In this embodiment 2, the model is evaluated and the network model is optimized. Inputting a test sample, evaluating the detection result of the obtained green fruit detection model by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained. The specific process is as follows:
and evaluating the model by adopting recall rate and accuracy, and providing basis for optimizing the model. And repeatedly training and model evaluating the model according to the recall rate and the accuracy until an optimized result is obtained.
In this embodiment 2, a final loss function is constructed by using the hungarian loss function and the SoftMax loss function, a network model is optimized, and model training is performed. The method comprises the following specific steps:
using y to represent the background truth set and usingRepresenting a prediction set, two matches between the two sets are found using the following formula:
wherein,is true value yiThe loss of binary match with the predicted sequence sigma (i),the arrangement of N elements is shown, N represents a prediction set with a fixed size, and the optimization algorithm works on the basis of the Hungarian algorithm.
The Softmax function is a function which is frequently used in deep learning, can map several input numbers into real numbers between 0 and 1, and can still ensure the sum of the several numbers to be 1 after normalization. It is formulated as:
where T represents the number of elements and the ratio of the index of the element to the sum of the indices of all elements is calculated.
I.e. the loss function is:
step 4.3: will l1Loss function and GLOU loss functionCombining the two functions on the basis of scale invariance to establish a boundary frame loss function of the user and combine the boundary frame loss function with the boundary frame loss functionIs defined as:
l1loss function: based on comparing the differences pixel by pixel and then taking the absolute value, x (p) represents the original image pixels, y (p) represents the pixels of the image after calculation, the formula is as follows:
λiou∈R、is a hyper-parameter, normalized by the number of objects in the batch, L1Is represented by1A loss function.
In conclusion, in this embodiment 2, the invention uses a Sparse-transformer encoder-decoder model to solve the problems of poor fruit detection efficiency and insensitivity to small targets of the visual system of the fruit picking robot. The method has high precision and high speed, and better meets the agricultural requirements of fruit picking robots, yield prediction and the like. The small target enhancement technology is used for expanding the sample space, the small sample data set is well adapted, the generalization capability is strong, and the method can be applied to robot vision systems for picking or pre-producing various fruits.
Example 3
In this embodiment 3, a fruit picking robot is provided, which includes a target fruit identification system in a background of the same color system, and the system can implement a target fruit identification method in a background of the same color system, including:
acquiring an orchard environment image to be identified;
processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;
wherein,
and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.
Example 4
Embodiment 4 of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium is used to store computer instructions, and when the computer instructions are executed by a processor, the method for identifying a target fruit in a same color system background as described above is implemented, where the method includes:
acquiring an orchard environment image to be identified;
processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;
wherein,
and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.
Example 5
Embodiment 5 of the present invention provides a computer program (product) comprising a computer program, which when run on one or more processors, is configured to implement a method for identifying a target fruit in a homochromatic background as described above, the method comprising:
acquiring an orchard environment image to be identified;
processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;
wherein,
and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.
Example 6
An embodiment 6 of the present invention provides an electronic device, including: a processor, a memory, and a computer program; wherein a processor is connected to the memory, a computer program is stored in the memory, and when the electronic device runs, the processor executes the computer program stored in the memory to make the electronic device execute the instructions for implementing the target fruit identification method in the same color family background as described above, the method includes:
acquiring an orchard environment image to be identified;
processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;
wherein,
and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.
In summary, the method and the system for identifying the target fruit in the homochromatic background according to the embodiments of the present invention use the Sparse-transformer encoder-decoder model to solve the problems of poor fruit detection efficiency and insensitivity to small target in the visual system of the fruit picking robot. The identification precision is high, the speed is high, and the agricultural requirements of fruit picking robots, yield prediction and the like are well met. The small target enhancement technology is used for expanding the sample space, the small sample data set is well adapted, the generalization capability is strong, and the method can be applied to robot vision systems for picking or pre-producing various fruits.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts based on the technical solutions disclosed in the present invention.
Claims (10)
1. A method for identifying target fruits in the same color system background is characterized by comprising the following steps:
acquiring an orchard environment image to be identified;
processing the orchard environment image to be recognized by using a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;
wherein,
and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.
2. The method of claim 1, wherein training the recognition model comprises: processing the training set by a deep convolution neural network, extracting characteristics, constructing sparse transformer model processing characteristics, processing by a feedforward neural network, and outputting a final detection result; inputting a test sample, evaluating the obtained detection result by using the evaluation index, adjusting the parameters of the model according to the evaluation result, and repeatedly training the improved model until the optimal network model is obtained.
3. The method for identifying target fruits in the same color family background as claimed in claim 2, wherein a single lens reflex is used to collect green target fruit images under different illumination, different time periods and different angles; copying target fruits smaller than preset pixels in the image by using a small target enhancement technology to expand the sample, carrying out classification and labeling and constructing a data set; and dividing the expanded data set into a training set, a verification set and a test set.
4. The method for identifying target fruits in the same color family background as claimed in claim 2, wherein the encoder of the constructed sparse transformer model comprises: replacing an attention module for processing feature mapping in a Transformer mechanism with a hole self-attention module; and processing and reducing the dimension of the image characteristics, adding a space position complement code, supplementing loss information, inputting a cavity self-attention mechanism, a residual error module and a regularization layer, processing the image characteristics, and outputting an encoder result through a feedforward neural network, the residual error module and the regularization layer.
5. The method for identifying target fruits in the same color family background as claimed in claim 4, wherein the decoder of the constructed sparse transformer model comprises: and inputting the parameters learned by the encoder into a cavity self-attention mechanism, a residual error module and a regularization layer, processing the parameters, inputting the processed results into a multi-head self-attention mechanism, a residual error module and a regularization layer, and processing the processed results through a feedforward neural network, the residual error module and the regularization layer to obtain detection results.
6. The method of claim 5, wherein the feedforward neural network computes the result by a multi-layered perceptron with the ReLU activation function and the hidden dimension, and a linear projection layer.
7. The method for identifying the target fruit under the homochromatic system background as claimed in claim 2, wherein a Hungarian loss function and a SoftMax loss function are used for constructing a final loss function, optimizing a network model and performing model training.
8. A system for identifying a target fruit in a homochromatic background, comprising:
the acquiring module is used for acquiring an orchard environment image to be identified;
the recognition module is used for processing the orchard environment image to be recognized by utilizing a pre-trained recognition model to obtain a target fruit recognition result; the pre-trained recognition model is obtained by training a training set, wherein the training set comprises a plurality of orchard environment images and labels for labeling target fruits in the orchard environment images;
wherein,
and when the orchard environment image to be recognized is processed by utilizing the pre-trained recognition model, adding a space position complement code to the extracted image characteristics to supplement loss information.
9. A non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement the method of identifying a target fruit in a homochromatic context of any of claims 1-6.
10. An electronic device, comprising: a processor, a memory, and a computer program; wherein a processor is connected to a memory, a computer program being stored in the memory, the processor executing the computer program stored in the memory when the electronic device is running, to cause the electronic device to execute instructions to implement the method for identifying a target fruit in a homochromatic context as claimed in any of the claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111228465.9A CN114187590A (en) | 2021-10-21 | 2021-10-21 | Method and system for identifying target fruits under homochromatic system background |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111228465.9A CN114187590A (en) | 2021-10-21 | 2021-10-21 | Method and system for identifying target fruits under homochromatic system background |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114187590A true CN114187590A (en) | 2022-03-15 |
Family
ID=80539840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111228465.9A Pending CN114187590A (en) | 2021-10-21 | 2021-10-21 | Method and system for identifying target fruits under homochromatic system background |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114187590A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114663814A (en) * | 2022-03-28 | 2022-06-24 | 安徽农业大学 | Fruit detection and yield estimation method and system based on machine vision |
CN114700941A (en) * | 2022-03-28 | 2022-07-05 | 中科合肥智慧农业协同创新研究院 | Strawberry picking method based on binocular vision and robot system |
CN115952830A (en) * | 2022-05-18 | 2023-04-11 | 北京字跳网络技术有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210010A (en) * | 2020-01-15 | 2020-05-29 | 上海眼控科技股份有限公司 | Data processing method and device, computer equipment and readable storage medium |
CN113076819A (en) * | 2021-03-17 | 2021-07-06 | 山东师范大学 | Fruit identification method and device under homochromatic background and fruit picking robot |
US20210224998A1 (en) * | 2018-11-23 | 2021-07-22 | Tencent Technology (Shenzhen) Company Limited | Image recognition method, apparatus, and system and storage medium |
CN113221874A (en) * | 2021-06-09 | 2021-08-06 | 上海交通大学 | Character recognition system based on Gabor convolution and linear sparse attention |
CN113269182A (en) * | 2021-04-21 | 2021-08-17 | 山东师范大学 | Target fruit detection method and system based on small-area sensitivity of variant transform |
-
2021
- 2021-10-21 CN CN202111228465.9A patent/CN114187590A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210224998A1 (en) * | 2018-11-23 | 2021-07-22 | Tencent Technology (Shenzhen) Company Limited | Image recognition method, apparatus, and system and storage medium |
CN111210010A (en) * | 2020-01-15 | 2020-05-29 | 上海眼控科技股份有限公司 | Data processing method and device, computer equipment and readable storage medium |
CN113076819A (en) * | 2021-03-17 | 2021-07-06 | 山东师范大学 | Fruit identification method and device under homochromatic background and fruit picking robot |
CN113269182A (en) * | 2021-04-21 | 2021-08-17 | 山东师范大学 | Target fruit detection method and system based on small-area sensitivity of variant transform |
CN113221874A (en) * | 2021-06-09 | 2021-08-06 | 上海交通大学 | Character recognition system based on Gabor convolution and linear sparse attention |
Non-Patent Citations (1)
Title |
---|
贾伟宽 等: "基于优化Transformer网络的绿色目标果实高效检测模型", 农业工程学报, vol. 37, no. 014, 23 July 2021 (2021-07-23), pages 163 - 170 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114663814A (en) * | 2022-03-28 | 2022-06-24 | 安徽农业大学 | Fruit detection and yield estimation method and system based on machine vision |
CN114700941A (en) * | 2022-03-28 | 2022-07-05 | 中科合肥智慧农业协同创新研究院 | Strawberry picking method based on binocular vision and robot system |
CN114700941B (en) * | 2022-03-28 | 2024-02-27 | 中科合肥智慧农业协同创新研究院 | Strawberry picking method based on binocular vision and robot system |
CN114663814B (en) * | 2022-03-28 | 2024-08-23 | 安徽农业大学 | Fruit detection and yield estimation method and system based on machine vision |
CN115952830A (en) * | 2022-05-18 | 2023-04-11 | 北京字跳网络技术有限公司 | Data processing method and device, electronic equipment and storage medium |
CN115952830B (en) * | 2022-05-18 | 2024-04-30 | 北京字跳网络技术有限公司 | Data processing method, device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111768432B (en) | Moving target segmentation method and system based on twin deep neural network | |
Mathur et al. | Crosspooled FishNet: transfer learning based fish species classification model | |
CN111191583B (en) | Space target recognition system and method based on convolutional neural network | |
CN108021947B (en) | A kind of layering extreme learning machine target identification method of view-based access control model | |
CN114187590A (en) | Method and system for identifying target fruits under homochromatic system background | |
CN112364931B (en) | Few-sample target detection method and network system based on meta-feature and weight adjustment | |
CN114332621B (en) | Disease and pest identification method and system based on multi-model feature fusion | |
Lee et al. | Plant Identification System based on a Convolutional Neural Network for the LifeClef 2016 Plant Classification Task. | |
CN114187450A (en) | Remote sensing image semantic segmentation method based on deep learning | |
CN113920472B (en) | Attention mechanism-based unsupervised target re-identification method and system | |
CN113269182A (en) | Target fruit detection method and system based on small-area sensitivity of variant transform | |
CN114724155A (en) | Scene text detection method, system and equipment based on deep convolutional neural network | |
CN111125397B (en) | Cloth image retrieval method based on convolutional neural network | |
CN113034506A (en) | Remote sensing image semantic segmentation method and device, computer equipment and storage medium | |
CN114359631A (en) | Target classification and positioning method based on coding-decoding weak supervision network model | |
CN116434045B (en) | Intelligent identification method for tobacco leaf baking stage | |
CN114612802A (en) | System and method for classifying fine granularity of ship target based on MBCNN | |
CN113420173A (en) | Minority dress image retrieval method based on quadruple deep learning | |
CN113076819A (en) | Fruit identification method and device under homochromatic background and fruit picking robot | |
CN113496221B (en) | Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering | |
CN114463614A (en) | Significance target detection method using hierarchical significance modeling of generative parameters | |
CN113011506A (en) | Texture image classification method based on depth re-fractal spectrum network | |
Si et al. | Image semantic segmentation based on improved DeepLab V3 model | |
CN117872127A (en) | Motor fault diagnosis method and equipment | |
CN117710841A (en) | Small target detection method and device for aerial image of unmanned aerial vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |