CN114663760A - Model training method, target detection method, storage medium and computing device - Google Patents
Model training method, target detection method, storage medium and computing device Download PDFInfo
- Publication number
- CN114663760A CN114663760A CN202210302367.3A CN202210302367A CN114663760A CN 114663760 A CN114663760 A CN 114663760A CN 202210302367 A CN202210302367 A CN 202210302367A CN 114663760 A CN114663760 A CN 114663760A
- Authority
- CN
- China
- Prior art keywords
- target
- images
- image
- loss
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000001514 detection method Methods 0.000 title claims abstract description 54
- 238000012549 training Methods 0.000 title claims abstract description 33
- 238000000605 extraction Methods 0.000 claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000011176 pooling Methods 0.000 claims description 15
- 238000010606 normalization Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 7
- 238000009826 distribution Methods 0.000 abstract description 13
- 238000010801 machine learning Methods 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 13
- 238000013508 migration Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 241000512897 Elaeis Species 0.000 description 6
- 235000001950 Elaeis guineensis Nutrition 0.000 description 6
- 230000005012 migration Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000013526 transfer learning Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000002095 anti-migrative effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000001617 migratory effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of machine learning, and provides a model training method, a target detection method, a storage medium and computing equipment, wherein the model training method comprises the following steps: acquiring a plurality of target images, wherein the plurality of target images comprise source domain images with labels and target domain images without labels; for each of the plurality of target images: substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map; inputting the first feature map into a classifier to obtain a classification result of the target image; inputting the first feature map into a first discriminator to obtain a first probability value of the target image belonging to the source domain; training a feature extractor, a classifier and a first discriminator based on target detection classification results, first probability values and labels of source domain images corresponding to the target images respectively so that the feature extractor learns feature distribution which can be shared by the source domain images and the target domain images, and the classifier can classify the target domain images more accurately.
Description
Technical Field
The present invention relates to the field of machine learning technologies, and in particular, to a model training method, a target detection method, a storage medium, and an electronic device.
Background
Large-scale forest investigation is a key research problem. Today, the rapid development of rich remote sensing images and deep learning algorithms brings new opportunities for the detection of large-scale forest trees such as oil palm trees. However, large scale tree counting and detection may be subject to remote sensing images of different acquisition conditions, such as different sensors, seasons and environments, resulting in different distributions between images. For example, as shown in FIG. 1, image A and image B are two different satellite images. Here we assume that image a is an image with enough labels, where there are 4 classes of labels, respectively the area between oil palm trees, other vegetation or bare land, impervious layers or clouds, and image B is an image without labels; due to differences in sensors, acquisition dates, and location of the regions, a significant difference in the histograms of the 4 classes (used to characterize the distribution of image pixel values) between image a and image B can be seen; even if the feature extractor and classifier have excellent detection accuracy and classification accuracy under an image with a label, the performance of the feature extractor and classifier may be drastically degraded when it is directly applied to an image without any label, such as image B.
Therefore, how to improve the performance of the feature extractor and the classifier on the label-free image becomes an urgent problem to be solved.
Disclosure of Invention
The invention provides a model training method, a target detection method, a computer readable storage medium and electronic equipment, wherein a feature extractor is trained to learn feature distribution which can be shared by a source domain image and a target domain image, so that a classifier can accurately classify the target domain image, and a label of the source domain image is migrated to the target domain image; in addition, due to the fact that the forest regions corresponding to the source region image and the target image are different, cross-region forest detection can be achieved.
In a first aspect, the present invention provides a method for model training, comprising:
obtaining a plurality of target images based on remote sensing images obtained by respectively shooting different forest regions, wherein the target images comprise source domain images with labels and target domain images without labels, and the forest regions corresponding to the source domain images and the target domain images are different;
for each image of the plurality of target images:
substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map;
inputting the first feature map into a classifier to obtain a classification result of the target image;
inputting the first feature map into a first discriminator to obtain a first probability value of the target image belonging to a source domain;
obtaining a first loss indicating an uncertainty of the classification result based on the target detection classification result and the first probability value corresponding to each of the plurality of target images;
obtaining a second loss indicating a classification error of the classifier based on the label of each image of the source domain and the corresponding classification result;
obtaining a third loss indicating an error of source domain classification based on the first feature map based on the first probability values corresponding to the target images respectively;
training the feature extractor, classifier, and first discriminator based on the first loss, the second loss, and the third loss.
In a second aspect, the present invention provides a method for target detection, comprising:
acquiring a target image to be detected;
segmenting the target image and determining a plurality of sub-images;
respectively carrying out detection classification on the multiple subgraphs through a feature extractor and a classifier, and determining a detection classification result; the feature extractor and the classifier are obtained by training according to any one of the above methods of the first aspect, and the detection classification result includes a plurality of target frames and respective categories of the plurality of target frames;
and merging the target frames belonging to the same category, and determining a target detection result of the target image.
In a third aspect, the present invention provides an apparatus for model training, comprising:
the system comprises an image acquisition module, a storage module and a processing module, wherein the image acquisition module is used for acquiring a plurality of target images based on remote sensing images acquired by respectively shooting different forest regions, the target images comprise source domain images with labels and target domain images without labels, and the forest regions corresponding to the source domain images and the target domain images are different;
a classification module to, for each of the plurality of target images: substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map; inputting the first feature map into a classifier to obtain a classification result of the target image; inputting the first feature map into a first discriminator to obtain a first probability value of the target image belonging to a source domain;
a first loss calculation module, configured to obtain a first loss indicating an uncertainty of the classification result based on the target detection classification result and the first probability value corresponding to each of the plurality of target images;
a second loss calculation module, configured to obtain a second loss indicating a classification error of the classifier based on the label and the corresponding classification result of each image of the source domain;
a third loss calculation module, configured to obtain a third loss based on the first probability values corresponding to the plurality of target images, where the third loss indicates an error of source domain classification based on the first feature map;
a training module to train the feature extractor, the classifier, and the first discriminator based on the first loss, the second loss, and the third loss.
In a fourth aspect, the present invention provides an apparatus for object detection, comprising:
the image acquisition module is used for acquiring a target image to be detected;
the segmentation module is used for segmenting the target image and determining a plurality of sub-images;
the classification module is used for respectively carrying out detection classification on the multiple subgraphs through the feature extractor and the classifier and determining a detection classification result; the feature extractor and the classifier are trained by the method of any one of the first aspect, and the detection classification result includes a plurality of target frames and respective categories of the target frames;
and the merging module is used for merging the target frames belonging to the same category and determining the target detection result of the target image.
In a fifth aspect, the invention provides a computer-readable storage medium comprising executable instructions which, when executed by a processor of an electronic device, perform the method according to any one of the first or second aspects.
In a sixth aspect, the present invention provides an electronic device, comprising a processor and a memory storing execution instructions, wherein when the processor executes the execution instructions stored in the memory, the processor performs the method according to any one of the first aspect or the second aspect.
The invention provides a model training method, a model training device, a computer readable storage medium and electronic equipment, wherein the method is used for obtaining a plurality of target images based on remote sensing images obtained by respectively shooting different forest regions, the target images comprise labeled source domain images and unlabeled target domain images, and the forest regions corresponding to the source domain images and the target domain images are different; for each of the plurality of target images: then substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map; inputting the first feature map into a classifier to obtain a classification result of the target image; inputting the first feature map into a first discriminator to obtain a first probability value of the target image belonging to the source domain; then, obtaining a first loss indicating uncertainty of the classification result based on the target detection classification result and the first probability value corresponding to each of the plurality of target images; then, based on the labels of the images of the source domain and the corresponding classification results, obtaining a second loss which indicates the classification error of the classifier; then, based on the first probability values corresponding to the plurality of target images respectively, obtaining a third loss indicating an error of source domain classification based on the first feature map; then, the feature extractor, the classifier, and the first discriminator are trained based on the first loss, the second loss, and the third loss. In summary, in the technical scheme of the invention, the feature extractor is trained to learn the feature distribution which can be shared by the source domain image and the target domain image, so that the classifier can classify the target domain image more accurately, and the label of the source domain image is migrated to the target domain image; in addition, due to the fact that the forest regions corresponding to the source region image and the target image are different, cross-region forest detection can be achieved.
Further effects of the above-mentioned unconventional preferred modes will be described below in conjunction with specific embodiments.
Drawings
In order to more clearly illustrate the embodiments or the prior art solutions of the present invention, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a schematic diagram of histograms of images for different acquisition conditions according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a recognition model according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for model training according to an embodiment of the present invention;
fig. 4 is a first flowchart illustrating a method for target detection according to an embodiment of the present invention;
fig. 5 is a schematic flowchart illustrating a second method for target detection according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an apparatus for target detection according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail and completely with reference to the following embodiments and accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, the source domain and the target domain are introduced。
Transfer learning (Transfer learning) specifically applies the learned knowledge or pattern on the source domain to a different but related target domain. Wherein, the source domain (source domain) represents a different field from the test sample, but has abundant supervision information; the target domain represents the domain where the test sample is located, with no or only a few labels. The data distribution of the source domain and the target domain is different, but the tasks are the same. Here, the task: is what is to be done, such as forest identification classification.
In the embodiment of the invention, model training is carried out through a source domain and a target domain, so that the forest detection and classification tasks, such as the detection and classification of oil palm trees, are realized. It should be noted that the forest herein is a certain category of forest, and in practical applications, various categories of forest may also be considered; the embodiment of the invention is described by taking a forest of a certain category (called as a target category for convenience of description and distinction) as an example. Wherein the source domain includes a plurality of images with labels and the target domain includes a plurality of images without labels. Here, since the forest and the forest region have high similarity, in order to improve the accuracy of forest identification, the labels of the embodiment of the present invention include forest regions, forest categories, and other labels may be designed in combination with actual situations, such as other vegetation or bare land, impermeable layers or clouds, and the like.
In practical applications, the images of the source domain and the target domain are usually remote sensing images, and the remote sensing images usually contain a large number of forest trees, so that the images of the source domain and the target domain are segmented images of the remote sensing images. In a specific embodiment, when constructing a source domain, a plurality of tags are preset; then, selecting remote sensing images obtained by respectively shooting one or more regions (including target types of woods), manually marking the remote sensing images, for example, marking the regions and labels to which the regions belong, then segmenting the marked remote sensing images, and forming a source region by the segmented images (carrying labels, for example, the images are parts of the marked regions, namely the labels of the marked regions can be used as the labels of the images); in practical application, the label can be segmented first. Then, the remote sensing images respectively shot by one or more regions (including the forest trees of the target category) different from the above regions are divided, and the divided images form the target region. Illustratively, the size of the images in the source and target domains may be 17 × 17 pixels.
The above constructions of the target domain and the source domain are only used as examples and are not specifically limited, as long as it is ensured that the acquisition conditions of the source domain and the target domain are different, such as different sensors, seasons, regions, and preferably regions.
It should be noted that the data distribution of the source domain and the target domain of the embodiment of the present invention is different. If a large number of labeled source domains are used for training, the trained classifier does not have good performance on the target domain. Based on this, the embodiment of the present invention proposes training a model in a mode of resisting transfer learning.
Next, anti-migratory learning will be described。
Fig. 2 shows a structure of a recognition model provided by an embodiment of the present invention. In the embodiment of the present invention, as shown in fig. 2, the recognition model includes a feature extractor, a classifier, and a first discriminator.
The method is characterized in that the anti-migration learning is a form of unsupervised deep migration learning, and a feature extractor, a classifier and a first discriminator are used for forming an anti-migration learning network model. The goal of anti-migration learning is how to extract features from the source domain image and the target domain image, so that the first discriminator cannot distinguish whether the extracted features are from the source domain or the target domain.
The feature extractor is used for mapping the image to a specific feature space, so that the first discriminator cannot distinguish from which source domain or target domain the image comes while the classifier can distinguish the label from the source domain image.
The feature extractor includes N convolution blocks, for example, N-5, which is described below as an example. IN one example, a convolutional block includes a convolutional layer, a Batch Normalization (BN) layer, an Instance Normalization (IN) layer, and an activation layer. The Convolutional layer realizes image processing through a Convolutional Neural Network (CNN); the activation layer processes through an activation function, for example, the activation function may be ReLU (reconstructed Linear units); it should be noted that although the BN layer can effectively accelerate model convergence, the CNN is made insensitive to image changes. The IN layer is added to eliminate the difference between different individuals, thereby enhancing the generalization of the network. IN addition, the BN layer and the IN layer are prior art, and are not described again IN the embodiments of the present invention.
Further, the feature extractor further comprises a pooling layer located between the jth convolution block and the j +1 th convolution block, j being a positive integer greater than 1. Exemplarily, j-2. For example, the pooling layer may be in a maximum pooling mode or a flat modeAre all pooled. Correspondingly, on the basis that the feature extractor comprises the pooling layer, the feature extractor may further comprise a second discriminator. Wherein the second discriminator is used for outputting the probability that the image belongs to the source domainBy way of example, whenWhen the number is greater than or equal to 0.5, the representative feature map belongs to the source domainAnd when the value is less than 0.5, the representative feature map belongs to the target domain. Further, the feature map output by the convolution layer of the last convolution block of the feature extractor is processed by the following formula (1) to obtain a new feature map hi:
Wherein f isiA characteristic diagram output by the last layer of convolution layer; h isiIs a new feature graph containing migration capability information;the feature level attention value is represented. Features with stronger migratory capabilities of the image can be given a greater level of feature attention.
It should be noted that the purpose of feature level attention is to find features of an image with stronger mobility between a source domain and a target domain, so as to map the source domain and the target domain from an original feature space to a new feature space (the source domain and the target domain have the same data distribution), so that the second discriminator cannot distinguish whether the image is from the target domain or the source domain. To measure this mobility, embodiments of the present invention describe the uncertainty by a method of information entropy. The information entropy, also called shannon entropy, is calculated by the following formula (2).
E(p)=-∑dPd·log(Pd) (2)
Wherein d is 0, PdRepresenting the probability of the image belonging to the target domain; d is 1, PdRepresenting the probability of the image belonging to the source domain; the embodiment of the present application considers only the case where d is 1. According to the information theory, the larger the entropy is, the larger the information amount is, and the stronger the image mobility is. Correspondingly, the feature extractor can calculate the feature level attention value (V) by the following formula (3)i F):
Wherein, the result output by E (-) is the information entropy.
In this way, the feature extractor can effectively measure the migration capability of the feature maps, know which feature maps are more suitable for classification, and which feature maps have a negative effect on classification. Therefore, a connection is established between the feature map output by the convolution layer of the last convolution block of the feature extractor and the feature level attention value, and a new feature map is generated, wherein the newly generated feature map contains the migration capability information.
The classifier classifies the source domain image and separates out correct labels as far as possible.
The first discriminator classifies the images in the feature space, and distinguishes whether the images come from a source domain or a target domain as far as possible.
In addition, the feature extractor and the classifier are grouped into a classification detection section, and the feature extractor and the first discriminator are grouped into a domain discrimination section. The optimization targets of the feature extractor and the first discriminator are opposite, the first discriminator tries to judge whether the image is from a source domain or a target domain, the feature extractor tries to enable the first discriminator not to judge the source of the image, and for the optimization targets of the antibody and the antibody, the feature distribution of the source domain image and the target domain image through the output of the feature extractor is close through countermeasures, namely the source domain and the target domain with different distributions are mapped to the same feature space, and a certain measurement criterion is searched to enable the distance of the measurement criterion in the space to be as close as possible; and therefore, the classifier can accurately classify the source domain image and the target domain image at the same time.
The loss function of the recognition model provided by the embodiment of the present invention is described next.
See, inter alia, the following equation (4).
Wherein L isSRepresents the shallow loss domain loss; l isDIndicating a deep signature domain loss; l isERepresenting the entropy loss; μ, α and β represent hyper-parameters that balance the shallow domain loss, deep domain loss and entropy loss.Representing a loss of classification of the labeled source domain image.
wherein L isy(. -) represents a cross entropy loss function; gy(. to) represents a classifier, output as the predicted ith image in the source domain belonging to yiThe probability of the category.
The shallow feature domain loss is used for enabling the feature extractor to learn features with stronger mobility between the source domain and the target domain, so that a second discriminator in the feature extractor cannot distinguish whether the image is from the target domain or the source domain. The shallow eigen-domain loss is calculated by the following equation (6):
wherein, Gd(. represents a second discriminator, Ld(. represents G)dTwo-class cross-entropy loss of (-) for an image of the source domain,equal to 1 for images of the target fieldEqual to 0; f'SA feature map representing an ith image of the source domain output by the jth convolution block; f'TTo represent a feature map of the ith image representing the target domain output by the jth convolution block. Notably, L isSRepresenting the error of source domain classification based on shallow features.
It should be noted that the generation of the feature map output by the feature extractor passes through the pooling layer, thereby resulting in the loss of shallow feature information. In addition, considering that the mobility of each image is different, for images which are not similar in the feature space, the migration of the features of the source domain to the target domain is negatively affected, and thus the classification performance of the classifier is affected. Considering that the feature extractor needs to obtain features with better mobility, the second discriminator cannot distinguish whether the image is from a target domain or a source domain, and meanwhile, the classifier can well complete classification tasks by using the features output by the feature extractor. As shown in fig. 2, embodiments of the present invention design shallow feature field loss prior to pooling.
Wherein the deep-layer feature domain loss is used to enable the feature extractor to learn features of a deep layer of an image with stronger mobility between the source domain image and the target domain image, and is calculated by the following formula (7):
wherein, gd(. represents a first discriminator, Ld(. represents g)dDichotomized cross entropy loss of (·). For the source-domain image(s),equal to 1 for the target domain imageEqual to 0. To explain further, gdThe output of (is) the probability that the image belongs to the source domainThus, our domain loss includes the shallow feature domain loss LSAnd domain loss L of deep featuresD。
It should be noted that, considering that the mobility of each image is different, for images that are not similar in the feature space, the migration of the knowledge of the source domain to the target domain may be negatively affected, thereby affecting the classification performance of the classifier. As shown in FIG. 2, embodiments of the present invention design deep feature domain penalties prior to classification.
Wherein L isECalculated by the following equation (8):
wherein L isERepresenting entropy loss for explaining the uncertainty of the classification result of the classifier; vi FRepresenting an entropy hierarchy attention value; c represents the number of classes, e.g. class 4; p is a radical ofi,cAnd (c) the probability that the prediction type corresponding to the ith image is c is shown.
It should be noted that the entropy level attention is similar to the feature level attention, and is used to describe the attention degree of the information amount loss of the image. The images with low mobility may cause the feature extractor to learn features with high mobility, and reduce the classification precision of the classifier, where the images with high mobility have a higher entropy level attention value, and the images with low mobility have a lower entropy level attention value, so that the feature extractor focuses more on the features with high mobility. Correspondingly, it can be calculated by the following formula (9)Calculating entropy hierarchy attention value (V)i E):
Wherein, the result output by E (-) is the information entropy.
It should be appreciated that in light of the entropy function of the information theory, entropy loss is used to reduce the uncertainty of the output class probability. The method has the advantages that the target image is adopted, so that the entropy loss is used, on one hand, due to the fact that forest trees in the image have large similarity with forest tree regions, the confidence degree of the prediction of the confusable sample can be improved due to the entropy loss; on the other hand, for images with poor similarity, if the confidence of the images is forcibly improved, the classifier is adversely affected; therefore, weighting of entropy-level attention values to entropy loss is given, so that minimal entropy regularization based on entropy-level attention makes the prediction of our classifier more reliable. The entropy loss regularization is used as a penalty term of the loss function, and some limits are made on certain parameters in the loss function.
As shown in fig. 3, a method for training a model according to an embodiment of the present invention is provided. The method provided by the embodiment of the invention can be applied to electronic equipment, and particularly can be applied to a server or a general computer. In this embodiment, the method specifically includes the following steps:
According to a feasible implementation mode, a first remote sensing image obtained by shooting a first forest region by a first sensor is segmented to obtain a plurality of images; then, labels are added to the images, and a plurality of source domain images are obtained. Of course, in practical applications, the first sensor usually takes a plurality of first remote sensing images, and the processing procedure of each first remote sensing image is the same.
According to a feasible implementation mode, a second remote sensing image obtained by shooting a second forest area by a second sensor is segmented to obtain a plurality of images; without adding labels to the images, a plurality of target domain images are obtained. Of course, in practical applications, the second sensor usually takes a plurality of second remote sensing images, and the processing procedure of each second remote sensing image is the same.
Illustratively, the first sensor and the second sensor are different. For example, the first sensor and the second sensor are sensors on different satellites.
For example, the first sensor and the second sensor may also be identical, but the times of acquisition are different.
Illustratively, the first forest area and the second forest area are different, thereby realizing cross-area identification.
Illustratively, the size of the target image may be 17 × 17 pixels.
Illustratively, the first and second forest zones comprise forest trees belonging to a target category, such as oil palm trees. Of course, the oil palm tree is merely an example and is not limited in particular, and the source domain image and the target domain image may be determined according to actual requirements, which is not limited in this embodiment of the present invention. The source domain image is an image in the source domain, and the target domain image is an image in the target domain.
For the related content of the source domain and the target domain construction, see above, the detailed description is omitted here.
For a detailed description of the feature extractor and the classifier, reference is made to the above description and no further description is made here.
According to a feasible implementation manner, for each image of a plurality of target images, calculating shannon entropy based on a corresponding first probability value, and calculating an attention value based on the shannon entropy; then, a first loss is determined based on the attention values and the attention values corresponding to the plurality of target images respectively and the probability values of the categories in the classification result.
The first loss corresponds to the entropy loss LE. It should be noted that the higher the probability of a certain category in the classification result relative to other categories is, the lower the uncertainty of the classification result is; if the probability difference of a plurality of categories in the classification result is small, the uncertainty of the classification result is higher. The first penalty is therefore designed to reduce the uncertainty of the classification result output by the classifier.
And 304, obtaining a second loss which indicates the classification error of the classifier based on the label of each source domain image and the corresponding classification result.
The second penalty corresponds to the classification penaltyIt should be noted that the smaller the classification loss, the higher the probability of the label of the source domain image predicted by the classifier, and the more accurate the classification result.
The third loss corresponds to the above-mentioned deep characteristic domain loss LD. Note that the deep profile loss LDThe smaller the error, the smaller the error that the feature extractor performs source domain classification on the first feature maps output by the source domain image and the target domain image, respectively.
According to one possible implementation, the feature extractor includes a first extraction layer, a second extraction layer, and a second discriminator; the first extraction layer is used for extracting shallow features of the target image to obtain a second feature map; the second discriminator is used for judging a second probability value of the target image belonging to the source domain image based on the second feature map; the second extraction layer is used for extracting deep features based on the second feature map and the second probability value to obtain the first feature map.
In one example, the feature extractor comprises a plurality of volume blocks and a pooling layer, the first advance layer comprises a plurality of volume blocks before the pooling layer, and the second extraction layer comprises the pooling layer and a plurality of volume blocks after; a single volume block includes a volume layer, a batch normalization layer, an instance normalization layer, and an activation layer. Details are given above and will not be described further here.
Further, still include: and obtaining a fourth loss based on the second probability values corresponding to the target images respectively, wherein the fourth loss indicates the error of source domain classification based on the second feature map.
And then, training a feature extractor, a classifier and a first discriminator based on the first loss, the second loss, the third loss and the fourth loss so as to find out features with strong mobility between the source domain image and the target domain image, constructing a feature space with strong mobility, mapping the source domain image and the target domain image into the feature space by the feature extractor, and further realizing classification of the target domain image by the classifier.
According to the technical scheme, the beneficial effects of the embodiment are as follows:
the feature extractor is trained to learn feature distribution which can be shared by the source domain image and the target domain image, so that the classifier can accurately classify the target domain image, and the label of the source domain image is transferred to the target domain image; in addition, due to the fact that the forest regions corresponding to the source region image and the target image are different, cross-region forest detection can be achieved.
Fig. 4 shows a method for detecting a target according to an embodiment of the present invention. The method provided by the embodiment of the invention can be applied to electronic equipment, and particularly can be applied to a server or a general computer. In this embodiment, the method specifically includes the following steps:
According to one possible embodiment, an image of the third area captured by the third sensor is acquired, and the image is taken as the target image. Here, the forest area corresponding to the target image and the forest areas corresponding to the plurality of target images may be different.
According to one possible embodiment, the target image is divided by a sliding window in an overlapping manner, so that a plurality of sub-images with the size meeting the input requirement of the model, for example, 17 × 17 pixels, are obtained.
As shown in fig. 5, for each of the multiple subgraphs, the subgraph is sequentially input into the feature extractor and the classifier, and the detection classification result of the classifier for the multiple subgraphs is obtained, including the coordinates and the category of the target frame of each subgraph, where there may be multiple target frames of each subgraph. In practical applications, the classifier outputs a probability distribution (including probability values corresponding to a plurality of classes), and the class corresponding to the maximum probability value in the probability distribution is used as the class for detecting the target frame in the classification result.
And 404, merging the target frames belonging to the same category, and determining a target detection result of the target image.
According to a feasible implementation mode, the merging Of the target frames in the same category is carried out by adopting an Intersection-Of-Union (IOU) based criterion, and the merging method based on the IOU does not need iterative steps, so that the merging efficiency can be improved. For example, if the IOU of two target boxes with the same category is greater than or equal to a given threshold, the coordinates of the two target boxes are averaged. In practical application, a plurality of target frames with IOU greater than or equal to a given threshold value are merged; the calculation formula of the merged target box is shown in the following formula (10).
Wherein (X)lt,Ylt) Coordinates representing the top left corner of the merged target box; (X)rb,Yrb) Coordinates representing the lower right corner of the merged target frame; n represents the number of target boxes with IOU greater than a threshold; (x)lt,i,ylt,i) Coordinates representing an upper left corner of an ith target box of the plurality of target boxes with the IOU greater than the threshold; (x)rb,i,yrb,i) Coordinates representing a lower right corner of an ith target box of the plurality of target boxes having the IOU greater than the threshold.
In practical applications, only the object boxes of a specified category, such as the oil palm tree category, may be merged.
According to the technical scheme, the beneficial effects of the embodiment are as follows:
the target image is divided, the divided images are respectively detected and classified, and then the target frames with the same type are combined, so that the accuracy of the target detection result of the image is improved.
Referring to fig. 6, based on the same concept as the method embodiment of the present invention, an embodiment of the present invention further provides a model training apparatus, including:
the image acquisition module 601 is configured to obtain a plurality of target images based on remote sensing images obtained by respectively shooting different forest regions, where the plurality of target images include a source domain image with a tag and a target domain image without a tag, and forest regions corresponding to the source domain image and the target domain image are different;
a classification module 602 configured to, for each of the plurality of target images: substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map; inputting the first feature map into a classifier to obtain a classification result of the target image; inputting the first feature map into a first discriminator to obtain a first probability value of the target image belonging to a source domain;
a first loss calculation module 603, configured to obtain a first loss indicating an uncertainty of the classification result based on the target detection classification result and the first probability value corresponding to each of the plurality of target images;
a second loss calculation module 604, configured to obtain a second loss indicating a classification error of the classifier based on the label and the corresponding classification result of each image of the source domain;
a third loss calculation module 605, configured to obtain a third loss based on the first probability values corresponding to the plurality of target images, where the third loss indicates an error of source domain classification based on the first feature map;
a training module 606 configured to train the feature extractor, the classifier, and the first discriminator based on the first loss, the second loss, and the third loss.
According to a feasible implementation mode, the target images are obtained by segmenting remote sensing images obtained by respectively shooting different forest regions;
the source domain images have a plurality of labels, and the labels at least comprise the forest type and forest region;
the different forest regions respectively comprise forests belonging to the target forest category;
the target domain image and the source domain image are from different sensors; and/or the presence of a gas in the gas,
the shooting seasons corresponding to the target domain image and the source domain image are different.
According to one possible embodiment, the feature extractor comprises a first extraction layer, a second extraction layer, and a second discriminator; the first extraction layer is used for extracting shallow features of the target image to obtain a second feature map; the second discriminator is used for judging a second probability value of the target image belonging to the source domain based on the second feature map; the second extraction layer is used for extracting deep features based on the second feature map and the second probability value to obtain a first feature map.
In one example, the feature extractor comprises a plurality of volume blocks and a pooling layer, the first advance layer comprises a plurality of volume blocks before the pooling layer, and the second extraction layer comprises the pooling layer and a plurality of volume blocks after;
a single volume block includes a volume layer, a batch normalization layer, an instance normalization layer, and an activation layer.
In one example, the second extraction layer is configured to calculate entropy based on the second probability value, determine a feature attention value based on the calculated entropy value, and perform deep feature extraction based on the feature attention value and the second feature map to obtain a first feature map.
According to a possible embodiment, the device further comprises: a third loss calculation module; wherein,
and the third loss calculation module is used for obtaining a fourth loss based on the second probability values corresponding to the target images respectively, wherein the fourth loss indicates an error of source domain classification based on the second feature map.
The training module 606 is configured to train the feature extractor, the classifier, and the first discriminator based on the first loss, the second loss, the third loss, and the fourth loss.
According to a possible implementation, the first loss calculation module 603 includes: an attention calculation unit and a loss calculation unit; wherein,
the attention calculation unit is used for calculating entropy based on the corresponding first probability value for each image of the plurality of target images, and obtaining an entropy attention value based on the calculated entropy value, wherein the entropy attention value indicates the attention degree of the target image;
the loss calculating unit is used for determining a first loss based on the entropy attention values corresponding to the target images and the probability values of the categories in the classification result.
Referring to fig. 7, based on the same concept as the method embodiment of the present invention, an embodiment of the present invention further provides an apparatus for target detection, including:
an image obtaining module 701, configured to obtain a target image to be detected;
a segmentation module 702, configured to segment the target image to determine multiple sub-images;
a classification module 703, configured to perform detection classification on the multiple sub-images through a feature extractor and a classifier, respectively, and determine a detection classification result; the feature extractor and the classifier are trained by the method of any one of the first aspect, and the detection classification result includes a plurality of target frames and respective categories of the target frames;
and a merging module 704, configured to merge the target frames belonging to the same category, and determine a target detection result of the target image.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. On the hardware level, the electronic device includes a processor 801 and a memory 802 storing execution instructions, and optionally further includes an internal bus 803 and a network interface 804. The Memory 802 may include a Memory 8021, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory 8022 (e.g., at least 1 disk Memory); the processor 801, the network interface 804, and the memory 802 may be connected to each other by an internal bus 803, and the internal bus 803 may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like; the internal bus 803 may be divided into an address bus, a data bus, a control bus, etc., which are indicated by only one double-headed arrow in fig. 8 for convenience of illustration, but do not indicate only one bus or one type of bus. Of course, the electronic device may also include hardware required for other services. When the processor 801 executes execution instructions stored by the memory 802, the processor 801 performs the method of any of the embodiments of the present invention and at least is used to perform the method as shown in fig. 3 or fig. 4.
In a possible implementation manner, the processor reads corresponding execution instructions from the nonvolatile memory into the memory and then executes the corresponding execution instructions, and corresponding execution instructions can also be obtained from other equipment, so as to form a model training device or a target detection device on a logic level. The processor executes the execution instructions stored in the memory to implement a model training method or an object detection method provided in any embodiment of the invention through the executed execution instructions.
The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Embodiments of the present invention further provide a computer-readable storage medium, which includes an execution instruction, and when a processor of an electronic device executes the execution instruction, the processor executes a method provided in any one of the embodiments of the present invention. The electronic device may specifically be the electronic device shown in fig. 8; the execution instruction is a computer program corresponding to a model training device or a target detection device.
It should be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.
The embodiments of the present invention are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present invention and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.
Claims (10)
1. A method of model training, comprising:
obtaining a plurality of target images based on remote sensing images obtained by respectively shooting different forest regions, wherein the target images comprise source domain images with labels and target domain images without labels, and the forest regions corresponding to the source domain images and the target domain images are different;
for each image of the plurality of target images:
substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map;
inputting the first feature map into a classifier to obtain a classification result of the target image;
inputting the first feature map into a first discriminator to obtain a first probability value of the target image belonging to a source domain;
obtaining a first loss indicating an uncertainty of the classification result based on the target detection classification result and the first probability value corresponding to each of the plurality of target images;
obtaining a second loss which indicates the classification error of the classifier based on the label of each source domain image and the corresponding classification result;
obtaining a third loss indicating an error of source domain classification based on the first feature map based on the first probability values corresponding to the target images respectively;
training the feature extractor, classifier, and first discriminator based on the first loss, the second loss, and the third loss.
2. The method according to claim 1, wherein the plurality of target images are obtained by segmenting based on remote sensing images respectively captured in different forest regions;
the source domain images have various labels, and at least comprise the forest type and the forest region;
the different forest regions respectively comprise forests belonging to the target forest category;
the target domain image and the source domain image are from different sensors; and/or the presence of a gas in the atmosphere,
the shooting seasons corresponding to the target domain image and the source domain image are different.
3. The method of claim 1, wherein the feature extractor comprises a first extraction layer, a second discriminator;
the first extraction layer is used for extracting shallow features of the target image to obtain a second feature map;
the second discriminator is used for judging a second probability value of the target image belonging to the source domain based on the second feature map;
the second extraction layer is used for extracting deep features based on the second feature map and the second probability value to obtain a first feature map.
4. The method of claim 3, wherein the feature extractor comprises a plurality of volume blocks and a pooling layer, the first extraction layer comprises a plurality of volume blocks before the pooling layer, and the second extraction layer comprises the pooling layer and a plurality of volume blocks after;
a single volume block includes a volume layer, a batch normalization layer, an instance normalization layer, and an activation layer.
5. The method of claim 3, wherein the second extraction layer is configured to calculate entropy based on the second probability value, determine a feature attention value based on the calculated entropy value, and extract deep features based on the feature attention value and the second feature map to obtain the first feature map.
6. The method of claim 3, further comprising:
obtaining a fourth loss based on the second probability values corresponding to the target images respectively, wherein the fourth loss indicates an error of source domain classification based on a second feature map;
the training the feature extractor, the classifier, and the first discriminator based on the first loss, the second loss, and the third loss includes:
training the feature extractor, classifier, and first discriminator based on the first loss, the second loss, the third loss, and a fourth loss.
7. The method of claim 1, wherein determining a first loss based on the target detection classification result and the first probability value for each of the plurality of target images comprises:
for each image of the plurality of target images, calculating entropy based on the corresponding first probability value, and deriving an entropy attention value based on the calculated entropy value, the entropy attention value indicating a degree of attention to the target image;
and determining a first loss based on the entropy attention values corresponding to the target images and the probability values of the categories in the classification result.
8. A method of target detection, comprising:
acquiring a target image to be detected;
segmenting the target image and determining a plurality of sub-images;
respectively carrying out detection classification on the multiple subgraphs through a feature extractor and a classifier, and determining a detection classification result; the feature extractor and the classifier are trained by the method of any one of claims 1 to 7, and the detection classification result comprises a plurality of target frames and respective categories of the target frames;
and merging the target frames belonging to the same category, and determining a target detection result of the target image.
9. A computer-readable storage medium comprising executable instructions that, when executed by a processor of an electronic device, cause the processor to perform the method of any of claims 1 to 7, or the method of claim 8.
10. A computing device comprising a processor and a memory storing execution instructions, the processor performing the method of any of claims 1 to 7, or the method of claim 8, when the processor executes the execution instructions stored by the memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210302367.3A CN114663760A (en) | 2022-03-25 | 2022-03-25 | Model training method, target detection method, storage medium and computing device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210302367.3A CN114663760A (en) | 2022-03-25 | 2022-03-25 | Model training method, target detection method, storage medium and computing device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114663760A true CN114663760A (en) | 2022-06-24 |
Family
ID=82030748
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210302367.3A Pending CN114663760A (en) | 2022-03-25 | 2022-03-25 | Model training method, target detection method, storage medium and computing device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114663760A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117372791A (en) * | 2023-12-08 | 2024-01-09 | 齐鲁空天信息研究院 | Fine grain directional damage area detection method, device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131967A (en) * | 2020-09-01 | 2020-12-25 | 河海大学 | Remote sensing scene classification method based on multi-classifier anti-transfer learning |
WO2021120752A1 (en) * | 2020-07-28 | 2021-06-24 | 平安科技(深圳)有限公司 | Region-based self-adaptive model training method and device, image detection method and device, and apparatus and medium |
CN113706551A (en) * | 2021-04-14 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Image segmentation method, device, equipment and storage medium |
CN113807420A (en) * | 2021-09-06 | 2021-12-17 | 湖南大学 | Domain self-adaptive target detection method and system considering category semantic matching |
-
2022
- 2022-03-25 CN CN202210302367.3A patent/CN114663760A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021120752A1 (en) * | 2020-07-28 | 2021-06-24 | 平安科技(深圳)有限公司 | Region-based self-adaptive model training method and device, image detection method and device, and apparatus and medium |
CN112131967A (en) * | 2020-09-01 | 2020-12-25 | 河海大学 | Remote sensing scene classification method based on multi-classifier anti-transfer learning |
CN113706551A (en) * | 2021-04-14 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Image segmentation method, device, equipment and storage medium |
CN113807420A (en) * | 2021-09-06 | 2021-12-17 | 湖南大学 | Domain self-adaptive target detection method and system considering category semantic matching |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117372791A (en) * | 2023-12-08 | 2024-01-09 | 齐鲁空天信息研究院 | Fine grain directional damage area detection method, device and storage medium |
CN117372791B (en) * | 2023-12-08 | 2024-03-22 | 齐鲁空天信息研究院 | Fine grain directional damage area detection method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111860670B (en) | Domain adaptive model training method, image detection method, device, equipment and medium | |
Li et al. | Localizing and quantifying damage in social media images | |
CN107133569B (en) | Monitoring video multi-granularity labeling method based on generalized multi-label learning | |
WO2021227366A1 (en) | Method for automatically and accurately detecting plurality of small targets | |
Soh et al. | ARKTOS: An intelligent system for SAR sea ice image classification | |
CN108108657A (en) | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning | |
CN110033018B (en) | Graph similarity judging method and device and computer readable storage medium | |
CN113469088B (en) | SAR image ship target detection method and system under passive interference scene | |
Fan et al. | A novel automatic dam crack detection algorithm based on local-global clustering | |
CN112365497A (en) | High-speed target detection method and system based on Trident Net and Cascade-RCNN structures | |
CN110163294B (en) | Remote sensing image change region detection method based on dimension reduction operation and convolution network | |
CN116596875A (en) | Wafer defect detection method and device, electronic equipment and storage medium | |
CN112651996B (en) | Target detection tracking method, device, electronic equipment and storage medium | |
CN110287970B (en) | Weak supervision object positioning method based on CAM and covering | |
CN114429577B (en) | Flag detection method, system and equipment based on high confidence labeling strategy | |
CN112418207B (en) | Weak supervision character detection method based on self-attention distillation | |
CN114663760A (en) | Model training method, target detection method, storage medium and computing device | |
CN117115565B (en) | Autonomous perception-based image classification method and device and intelligent terminal | |
CN108960005B (en) | Method and system for establishing and displaying object visual label in intelligent visual Internet of things | |
CN113283396A (en) | Target object class detection method and device, computer equipment and storage medium | |
Shishkin et al. | Implementation of yolov5 for detection and classification of microplastics and microorganisms in marine environment | |
CN108154107B (en) | Method for determining scene category to which remote sensing image belongs | |
CN117671312A (en) | Article identification method, apparatus, electronic device, and computer-readable storage medium | |
CN111353349B (en) | Human body key point detection method and device, electronic equipment and storage medium | |
Wen et al. | LESM-YOLO: An Improved Aircraft Ducts Defect Detection Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |