CN114663760A

CN114663760A - Model training method, target detection method, storage medium and computing device

Info

Publication number: CN114663760A
Application number: CN202210302367.3A
Authority: CN
Inventors: 郑珏鹏; 付昊桓; 徐一丹
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-06-24

Abstract

The invention relates to the technical field of machine learning, and provides a model training method, a target detection method, a storage medium and computing equipment, wherein the model training method comprises the following steps: acquiring a plurality of target images, wherein the plurality of target images comprise source domain images with labels and target domain images without labels; for each of the plurality of target images: substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map; inputting the first feature map into a classifier to obtain a classification result of the target image; inputting the first feature map into a first discriminator to obtain a first probability value of the target image belonging to the source domain; training a feature extractor, a classifier and a first discriminator based on target detection classification results, first probability values and labels of source domain images corresponding to the target images respectively so that the feature extractor learns feature distribution which can be shared by the source domain images and the target domain images, and the classifier can classify the target domain images more accurately.

Description

Model training method, target detection method, storage medium and computing device

技术领域technical field

本发明涉及机器学习技术领域，尤其涉及模型训练的方法、目标检测方法、存储介质及电子设备。The present invention relates to the technical field of machine learning, and in particular, to a method for model training, a method for target detection, a storage medium and an electronic device.

背景技术Background technique

大规模的林木调查是一个关键的研究问题。如今，丰富的遥感图像和深度学习算法的快速发展为大规模的林木比如油棕榈树检测带来了新的机遇。然而，大规模的树木计数和检测可能会面临不同采集条件的遥感图像，如不同的传感器、季节和环境，导致图像之间的分布不同。例如，如图1所示，图像A和图像B是两个不同的卫星图像。在这里，我们假设图像A为具有足够标签的图像，这里标签有4类，分别为油棕榈树之间的区域，油棕榈树，其他植被或裸地，不透水层或云，而图像B为没有标签的图像；由于传感器、采集日期和区域位置的差异，可以看到图像A和图像B之间在4个类别的直方图(用来表征图像像素值的分布情况)方面的明显差异；即使特征提取器和分类器在具有标签的图像下具有出色的检测精度和分类精度，当它直接应用于没有任何标签的图像时，比如图像B，特征提取器和分类器的性能可能会急剧下降。Large-scale tree surveys are a key research question. Today, abundant remote sensing images and the rapid development of deep learning algorithms have brought new opportunities for large-scale detection of forest trees such as oil palms. However, large-scale tree counting and detection may face remote sensing images with different acquisition conditions, such as different sensors, seasons, and environments, resulting in different distributions between images. For example, as shown in Figure 1, Image A and Image B are two different satellite images. Here, we assume that image A is an image with enough labels, here the labels have 4 categories, namely the area between oil palm trees, oil palm trees, other vegetation or bare ground, impervious layer or cloud, and image B is Images without labels; significant differences can be seen between Image A and Image B in the 4-category histograms (used to characterize the distribution of image pixel values) due to differences in sensor, acquisition date, and region location; even if Feature extractor and classifier have excellent detection accuracy and classification accuracy under images with labels, when it is directly applied to images without any labels, such as image B, the performance of feature extractor and classifier may drop sharply.

因此，如何提高特征提取器和分类器在无标签图像上的性能成为了亟待解决的问题。Therefore, how to improve the performance of feature extractors and classifiers on unlabeled images has become an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种模型训练的方法、目标检测方法、计算机可读存储介质及电子设备，通过训练特征提取器以使其学习源域图像和目标域图像可共用的特征分布，使得分类器可对目标域图像进行较为准确的分类，实现将源域图像的标签迁移到目标域图像；另外，由于源域图像和目标图像对应的林木区域不同，可实现跨区域的林木检测。The present invention provides a model training method, a target detection method, a computer-readable storage medium and an electronic device. By training a feature extractor to make it learn the feature distribution shared by the source domain image and the target domain image, the classifier can The target domain image is more accurately classified, and the label of the source domain image is transferred to the target domain image; in addition, since the forest area corresponding to the source domain image and the target image are different, cross-regional tree detection can be realized.

第一方面，本发明提供了一种模型训练的方法，包括：In a first aspect, the present invention provides a method for model training, including:

基于对不同林木区域分别拍摄得到的遥感图像，得到多个目标图像，所述多个目标图像包括带标签的源域图像和无标签的目标域图像，所述源域图像和所述目标域图像所对应的林木区域不同；Based on the remote sensing images obtained from different forest areas, a plurality of target images are obtained, and the plurality of target images include a labeled source domain image and an unlabeled target domain image, the source domain image and the target domain image The corresponding forest areas are different;

对于所述多个目标图像的各图像：For each image of the plurality of target images:

将所述目标图像代入特征提取器进行特征提取处理，得到第一特征图；Substitute the target image into a feature extractor to perform feature extraction processing to obtain a first feature map;

将所述第一特征图输入分类器中，得到对所述目标图像的分类结果；Inputting the first feature map into a classifier to obtain a classification result for the target image;

将所述第一特征图输入第一判别器中，得到所述目标图像属于源域的第一概率值；Inputting the first feature map into the first discriminator to obtain the first probability value that the target image belongs to the source domain;

基于所述多个目标图像各自对应的目标检测分类结果和所述第一概率值，得到第一损失，其指示了所述分类结果的不确定度；obtaining a first loss, which indicates the uncertainty of the classification result, based on the target detection classification results and the first probability value corresponding to the plurality of target images;

基于所述源域的各图像具有的标签、对应的分类结果，得到第二损失，其指示了所述分类器的分类误差；Obtaining a second loss indicating the classification error of the classifier based on the label and the corresponding classification result of each image of the source domain;

基于所述多个目标图像各自对应的第一概率值，得到第三损失，其指示了基于第一特征图进行源域分类的误差；obtaining a third loss based on the respective first probability values of the plurality of target images, which indicates the error of the source domain classification based on the first feature map;

基于所述第一损失、所述第二损失和所述第三损失，对所述特征提取器、分类器和第一判别器进行训练。The feature extractor, classifier and first discriminator are trained based on the first loss, the second loss and the third loss.

第二方面，本发明提供了一种目标检测的方法，包括：In a second aspect, the present invention provides a method for target detection, comprising:

获取待检测的目标图像；Obtain the target image to be detected;

对所述目标图像进行分割，确定多个子图；Segmenting the target image to determine a plurality of sub-images;

通过特征提取器和分类器，对所述多个子图分别进行检测分类，确定检测分类结果；所述特征提取器和分类器通过上述第一方面任一方法训练得到，所述检测分类结果包括多个目标框和所述多个目标框各自的类别；The feature extractor and the classifier are used to detect and classify the plurality of sub-images respectively, and the detection and classification results are determined; the feature extractor and the classifier are obtained by training any method in the first aspect, and the detection and classification results include multiple a target frame and the respective categories of the plurality of target frames;

对属于相同类别的各目标框进行合并，确定所述目标图像的目标检测结果。The target frames belonging to the same category are merged to determine the target detection result of the target image.

第三方面，本发明提供了一种模型训练的装置，包括：In a third aspect, the present invention provides a device for model training, comprising:

图像获取模块，用于基于对不同林木区域分别拍摄得到的遥感图像，得到多个目标图像，所述多个目标图像包括带标签的源域图像和无标签的目标域图像，所述源域图像和所述目标域图像所对应的林木区域不同；The image acquisition module is used to obtain a plurality of target images based on the remote sensing images obtained by shooting different forest areas respectively, and the plurality of target images include a labeled source domain image and an unlabeled target domain image, and the source domain image is different from the forest area corresponding to the target domain image;

分类模块，用于对于所述多个目标图像的各图像：将所述目标图像代入特征提取器进行特征提取处理，得到第一特征图；将所述第一特征图输入分类器中，得到对所述目标图像的分类结果；将所述第一特征图输入第一判别器中，得到所述目标图像属于源域的第一概率值；The classification module is used for each image of the plurality of target images: substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map; inputting the first feature map into the classifier to obtain a pair of The classification result of the target image; inputting the first feature map into the first discriminator to obtain the first probability value that the target image belongs to the source domain;

第一损失计算模块，用于基于所述多个目标图像各自对应的目标检测分类结果和所述第一概率值，得到第一损失，其指示了所述分类结果的不确定度；a first loss calculation module, configured to obtain a first loss based on the target detection classification results corresponding to the plurality of target images and the first probability value, which indicates the uncertainty of the classification result;

第二损失计算模块，用于基于所述源域的各图像具有的标签、对应的分类结果，得到第二损失，其指示了所述分类器的分类误差；A second loss calculation module, configured to obtain a second loss based on the label and the corresponding classification result of each image in the source domain, which indicates the classification error of the classifier;

第三损失计算模块，用于基于所述多个目标图像各自对应的第一概率值，得到第三损失，其指示了基于第一特征图进行源域分类的误差；a third loss calculation module, configured to obtain a third loss based on the respective first probability values of the plurality of target images, which indicates the error of the source domain classification based on the first feature map;

训练模块，用于基于所述第一损失、所述第二损失和所述第三损失，对所述特征提取器、分类器和第一判别器进行训练。A training module for training the feature extractor, the classifier and the first discriminator based on the first loss, the second loss and the third loss.

第四方面，本发明提供了一种目标检测的装置，包括：In a fourth aspect, the present invention provides a device for target detection, comprising:

图像获取模块，用于获取待检测的目标图像；an image acquisition module for acquiring the target image to be detected;

分割模块，用于对所述目标图像进行分割，确定多个子图；a segmentation module, for segmenting the target image and determining a plurality of sub-images;

分类模块，用于通过特征提取器和分类器，对所述多个子图分别进行检测分类，确定检测分类结果；所述特征提取器和分类器通过上述第一方面中任一所述的方法训练得到，所述检测分类结果包括多个目标框和所述多个目标框各自的类别；A classification module, configured to detect and classify the plurality of sub-images respectively through a feature extractor and a classifier, and determine a detection and classification result; the feature extractor and the classifier are trained by any one of the methods described in the first aspect above. Obtain, the detection and classification result includes multiple target frames and respective categories of the multiple target frames;

合并模块，用于对属于相同类别的各目标框进行合并，确定所述目标图像的目标检测结果。The merging module is used for merging each target frame belonging to the same category to determine the target detection result of the target image.

第五方面，本发明提供了一种计算机可读存储介质，包括执行指令，当电子设备的处理器执行所述执行指令时，所述处理器执行如第一方面或第二方面中任一所述的方法。In a fifth aspect, the present invention provides a computer-readable storage medium, comprising execution instructions, when a processor of an electronic device executes the execution instructions, the processor executes any one of the first aspect or the second aspect. method described.

第六方面，本发明提供了一种电子设备，包括处理器以及存储有执行指令的存储器，当所述处理器执行所述存储器存储的所述执行指令时，所述处理器执行如第一方面或第二方面中任一所述的方法。In a sixth aspect, the present invention provides an electronic device, comprising a processor and a memory storing execution instructions. When the processor executes the execution instructions stored in the memory, the processor executes the first aspect. or the method of any one of the second aspects.

本发明提供了一种模型训练的方法、装置、计算机可读存储介质及电子设备，该方法基于对不同林木区域分别拍摄得到的遥感图像，得到多个目标图像，多个目标图像包括有标签的源域图像和无标签的目标域图像，源域图像和目标域图像所对应的林木区域不同；对于多个目标图像的各图像：然后，将目标图像代入特征提取器进行特征提取处理，得到第一特征图；将第一特征图输入分类器中，得到对目标图像的分类结果；将第一特征图输入第一判别器中，得到目标图像属于源域的第一概率值；然后，基于多个目标图像各自对应的目标检测分类结果和第一概率值，得到第一损失，其指示了分类结果的不确定度；然后，基于源域的各图像具有的标签、对应的分类结果，得到第二损失，其指示了分类器的分类误差；然后，基于多个目标图像各自对应的第一概率值，得到第三损失，其指示了基于第一特征图进行源域分类的误差；之后，基于第一损失、第二损失和第三损失，对特征提取器、分类器和第一判别器进行训练。综上所述，本发明的技术方案通过训练特征提取器以使其学习源域图像和目标域图像可共用的特征分布，使得分类器可对目标域图像进行较为准确的分类，实现将源域图像的标签迁移到目标域图像；另外，由于源域图像和目标图像对应的林木区域不同，可实现跨区域的林木检测。The present invention provides a method, device, computer-readable storage medium and electronic equipment for model training. The method obtains multiple target images based on remote sensing images obtained from different forest areas respectively, and the multiple target images include labeled The source domain image and the unlabeled target domain image, the forest area corresponding to the source domain image and the target domain image are different; for each image of multiple target images: Then, the target image is substituted into the feature extractor for feature extraction processing, and the first a feature map; input the first feature map into the classifier to obtain the classification result of the target image; input the first feature map into the first discriminator to obtain the first probability value that the target image belongs to the source domain; then, based on multiple The target detection classification result and the first probability value corresponding to each target image, and the first loss is obtained, which indicates the uncertainty of the classification result; then, based on the label of each image in the source domain and the corresponding classification result, the first loss is obtained. The second loss indicates the classification error of the classifier; then, based on the first probability values corresponding to each of the multiple target images, a third loss is obtained, which indicates the error of the source domain classification based on the first feature map; The first loss, the second loss and the third loss, train the feature extractor, the classifier and the first discriminator. To sum up, the technical solution of the present invention enables the classifier to more accurately classify the target domain images by training the feature extractor to learn the feature distribution that can be shared by the source domain image and the target domain image, and realizes the source domain image. The label of the image is transferred to the target domain image; in addition, since the forest tree regions corresponding to the source domain image and the target image are different, cross-regional tree detection can be realized.

上述的非惯用的优选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。Further effects of the above-mentioned non-conventional preferred mode will be described below in conjunction with specific embodiments.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明中记载的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the existing technical solutions more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the existing technology. Obviously, the accompanying drawings in the following description are only the For some embodiments described in the invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1为本发明实施例提供的不同采集条件的图像的直方图的示意图；1 is a schematic diagram of a histogram of images with different acquisition conditions provided by an embodiment of the present invention;

图2为本发明实施例提供的识别模型的结构示意图；2 is a schematic structural diagram of an identification model provided by an embodiment of the present invention;

图3为本发明实施例提供的一种模型训练的方法的流程示意图；3 is a schematic flowchart of a method for model training provided by an embodiment of the present invention;

图4为本发明实施例提供的一种目标检测的方法的流程示意图一；FIG. 4 is a schematic flowchart 1 of a method for target detection provided by an embodiment of the present invention;

图5为本发明实施例提供的一种目标检测的方法的流程示意图二；5 is a second schematic flowchart of a method for target detection provided by an embodiment of the present invention;

图6为本发明实施例提供的一种模型训练的装置的结构示意图；6 is a schematic structural diagram of an apparatus for model training provided by an embodiment of the present invention;

图7为本发明实施例提供的一种目标检测的装置的结构示意图；7 is a schematic structural diagram of an apparatus for target detection provided by an embodiment of the present invention;

图8为本发明实施例提供的一种电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合具体实施例及相应的附图对本发明的技术方案进行清楚、完整地描述。显然，所描述的实施例仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments and corresponding drawings. Obviously, the described embodiments are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

首先对源域和目标域进行介绍。 First, the source domain and target domain are introduced .

迁移学习(Transfer learning)具体而言是将源域上学习到的知识或模式应用到不同但相关的目标域中。其中，源域(source domain)表示与测试样本不同的领域，但是有丰富的监督信息；目标域(target domain)表示测试样本所在的领域，无标签或者只有少量标签。源域和目标域的数据分布不同，但任务相同。这里，任务：就是要做的事情，比如林木识别分类。Transfer learning specifically applies knowledge or patterns learned on the source domain to a different but related target domain. Among them, the source domain (source domain) represents a field different from the test sample, but has rich supervision information; the target domain (target domain) represents the field where the test sample is located, with no labels or only a few labels. The data distribution of the source and target domains is different, but the tasks are the same. Here, the task: is what to do, such as forest tree identification and classification.

在本发明实施例中，通过源域和目标域进行模型训练，实现林木检测分类的任务，比如，油棕榈树的检测分类。应当注意，这里的林木为某一类别的林木，在实际应用中，也可以考虑多种类别的林木；本发明实施例以某一类别(为了便于描述和区分，称为目标类别)的林木为例进行描述。其中，源域包括具有标签的多个图像，目标域包括无标签的多个图像。这里，由于林木和林木间区域的相似性较高，为了提高林木识别的准确性，本发明实施例的标签包括林木间区域，林木类别，其他的标签可结合实际情况设计，比如，其他植被或裸地，不透水层或云等。In the embodiment of the present invention, the model training is performed through the source domain and the target domain, so as to realize the task of detection and classification of forest trees, for example, detection and classification of oil palm trees. It should be noted that the forest trees here are trees of a certain type, and in practical applications, various types of forest trees can also be considered; in the embodiment of the present invention, the forest trees of a certain type (for the convenience of description and distinction, referred to as the target type) are example is described. The source domain includes multiple images with labels, and the target domain includes multiple images without labels. Here, due to the high similarity between forest trees and areas between forest trees, in order to improve the accuracy of forest tree identification, the labels in this embodiment of the present invention include areas between trees and forest tree categories. Other labels can be designed in combination with actual conditions, such as other vegetation or Bare ground, impervious layers or clouds, etc.

在实际应用中，源域和目标域的图像通常都是遥感图像，而遥感图像通常包含大片林木，因此，源域和目标域的图像都是对遥感图像分割后的图像。在一个具体的实施方式中，在构建源域时，预先设置多个标签；然后，选择对一个区域或多个区域(包括目标类别的林木)分别拍摄得到的遥感图像，并对这些遥感图像进行人工标记，比如，标注出区域和区域所属的标签，之后，对标注的遥感图像进行分割，分割后的图像(携带标签，比如，图像为标注出的区域的部分，即可将标注出的区域的标签作为该图像的标签；如果图像具有两个标签，即包括标注的不同区域之间的分界线，则面积较大的区域的标签作为该图像的标签，如果面积差不多即难以确定图像所属的标签，则可舍弃该图像)形成源域；在实际应用中，也可以先分割在标注标签。之后，对不同于上述区域的其他的一个或多个区域(包括目标类别的林木)分别拍摄的遥感图像进行分割，分割后的图像形成目标域。示例地，源域和目标域中的图像的尺寸可以为17×17像素。In practical applications, the images of the source domain and the target domain are usually remote sensing images, and the remote sensing images usually contain large forests. Therefore, the images of the source domain and the target domain are images after segmentation of the remote sensing images. In a specific embodiment, when constructing the source domain, multiple labels are preset; then, remote sensing images obtained from one area or multiple areas (including forest trees of the target category) are selected, and these remote sensing images are processed Manual labeling, for example, label the area and the label to which the area belongs, then segment the labeled remote sensing image, and the segmented image (carrying the label, for example, if the image is part of the labeled area, the labeled area can be The label of the image is used as the label of the image; if the image has two labels, that is, the boundary between the different areas marked, the label of the area with a larger area is used as the label of the image. If the area is similar, it is difficult to determine the image to which the image belongs. label, the image can be discarded) to form the source domain; in practical applications, the label can also be segmented first. Afterwards, segment the remote sensing images taken separately from one or more areas different from the above area (including forest trees of the target category), and the segmented images form the target domain. Illustratively, the size of the images in the source and target domains may be 17×17 pixels.

上述目标域和源域的构建仅仅作为示例，并不构成具体限定，只要保证源域和目标域的采集条件是不同的，如不同的传感器、季节、区域，优选区域不同。The construction of the above target domain and source domain is only an example, and does not constitute a specific limitation, as long as the acquisition conditions of the source domain and the target domain are different, such as different sensors, seasons, regions, and preferred regions are different.

需要说明的是，本发明实施例的源域和目标域的数据分布不同。若采用大量带标签的源域进行训练，训练得到的分类器在目标域上并不会有很好的表现。基于此，本发明实施例提出了对抗迁移学习的方式进行模型的训练。It should be noted that the data distribution of the source domain and the target domain in this embodiment of the present invention are different. If a large number of labeled source domains are used for training, the trained classifier will not perform well on the target domain. Based on this, the embodiment of the present invention proposes an adversarial transfer learning method for model training.

接下来对对抗迁移学习进行介绍。 Next, we will introduce the adversarial transfer learning .

图2示出了本发明实施例提供的识别模型的结构。本发明实施例中，如图2所示，识别模型包括特征提取器、分类器、第一判别器。FIG. 2 shows a structure of a recognition model provided by an embodiment of the present invention. In the embodiment of the present invention, as shown in FIG. 2 , the recognition model includes a feature extractor, a classifier, and a first discriminator.

对抗迁移学习是无监督深度迁移学习的一种形式，使用特征提取器，分类器和第一判别器三部分组成对抗迁移学习网络模型。对抗迁移学习的目的是如何从源域图像和目标域图像中提取特征，使得第一判别器无法区分提取的特征是来自源域，还是目标域。Adversarial transfer learning is a form of unsupervised deep transfer learning, which uses three parts: feature extractor, classifier and first discriminator to form an adversarial transfer learning network model. The purpose of adversarial transfer learning is how to extract features from the source domain image and the target domain image, so that the first discriminator cannot distinguish whether the extracted features are from the source domain or the target domain.

其中，特征提取器用来将图像映射到特定的特征空间，使分类器能够分辨出来自源域图像的标签的同时，第一判别器无法区分图像来自哪个源域还是目标域。Among them, the feature extractor is used to map the image to a specific feature space, so that the classifier can distinguish the labels from the source domain image, while the first discriminator cannot distinguish which source domain or target domain the image comes from.

特征提取器包括N个卷积块，比如，N＝5，下文以N＝5为例进行描述。在一个例子中，卷积块包括了一个卷积层、一个批归一化(Batch Normalization，BN)层、一个实例归一化(Instance Normalization,IN)层和一个激活层。卷积层通过卷积神经网络(Convolutional Neural Network,CNN)实现图像处理；激活层通过激活函数进行处理，比如，激活函数可以为ReLU(Rectified Linear Units)；需要说明的是，尽管BN层可以有效加速模型收敛，但会使CNN对图像的变化不敏感。所以增加了IN层来消除不同个体的差异，从而增强了网络的泛化性。另外，BN层和IN层为现有技术，本发明实施例对此不再赘述。The feature extractor includes N convolution blocks, for example, N=5, which will be described below by taking N=5 as an example. In one example, a convolutional block includes a convolutional layer, a Batch Normalization (BN) layer, an Instance Normalization (IN) layer, and an activation layer. The convolutional layer realizes image processing through the Convolutional Neural Network (CNN); the activation layer is processed through the activation function, for example, the activation function can be ReLU (Rectified Linear Units); it should be noted that although the BN layer can be effective Accelerates model convergence, but makes the CNN insensitive to image changes. Therefore, the IN layer is added to eliminate the differences between different individuals, thereby enhancing the generalization of the network. In addition, the BN layer and the IN layer are in the prior art, and details are not described in this embodiment of the present invention.

进一步地，特征提取器还包括位于第j个卷积块和第j+1个卷积块之间的池化层，j为大于1的正整数。示例地，j＝2。示例地，池化层采用的池化方式可以为最大池化，也可以为平均池化。对应地，在特征提取器包括池化层的基础上，特征提取器还可以包括第二判别器。其中，第二判别器用于输出图像属于源域的概率

示例地，当

大于等于0.5时，代表特征图属于源域，当

小于0.5时，代表特征图属于目标域。进一步地，特征提取器的最后一个卷积块的卷积层输出的特征图通过如下公式(1)示出的公式处理得到新的特征图h_i：Further, the feature extractor further includes a pooling layer located between the jth convolution block and the j+1th convolution block, where j is a positive integer greater than 1. Illustratively, j=2. For example, the pooling method adopted by the pooling layer may be maximum pooling or average pooling. Correspondingly, on the basis that the feature extractor includes a pooling layer, the feature extractor may further include a second discriminator. Among them, the second discriminator is used to output the probability that the image belongs to the source domain

For example, when

When it is greater than or equal to 0.5, it means that the feature map belongs to the source domain.

When it is less than 0.5, the representative feature map belongs to the target domain. Further, the feature map output by the convolution layer of the last convolution block of the feature extractor is processed by the formula shown in the following formula (1) to obtain a new feature map h _i :

其中，f_i为最后一层卷积层输出的特征图；h_i是包含了迁移能力信息的新特征图；

表示特征层次注意力值。图像的迁移能力更强的特征能够被赋予更大的特征层次注意力值。Among them, f _i is the feature map output by the last convolutional layer; _hi is the new feature map containing the transfer ability information;

Represents the feature-level attention value. The more transferable features of the image can be assigned a larger feature-level attention value.

需要说明的是，特征层次注意力的目的是为了找到在源域与目标域之间迁移性更强的图像的特征，从而将源域和目标域从原始特征空间映射到新的特征空间(源域和目标域具有相同的数据分布)，使得第二判别器无法区分图像是来自目标域，还是源域。为了度量这个迁移性，本发明实施例通过信息熵的方法来描述不确定度。信息熵也叫做香农熵，通过如下公式(2)计算。It should be noted that the purpose of feature-level attention is to find the features of images that are more mobile between the source and target domains, thereby mapping the source and target domains from the original feature space to the new feature space (source domain and target domain have the same data distribution), so that the second discriminator cannot distinguish whether the image is from the target domain or the source domain. In order to measure this mobility, the embodiment of the present invention describes the uncertainty by the method of information entropy. Information entropy, also called Shannon entropy, is calculated by the following formula (2).

E(p)＝-∑_dP_d·log(P_d) (2)E(p)=-∑ _d P _d ·log(P _d ) (2)

其中,d＝0,P_d代表图像属于目标域的概率；d＝1,P_d代表图像属于源域的概率；本申请实施例仅仅考虑d＝1的情况。根据信息理论，熵越大，信息量就越大，图像的迁移性也就更强。对应的，特征提取器可以通过如下公式(3)计算特征层次注意力值(V_i ^F)：Among them, d=0, P _d represents the probability that the image belongs to the target domain; d=1, P _d represents the probability that the image belongs to the source domain; the embodiment of the present application only considers the case of d=1. According to information theory, the greater the entropy, the greater the amount of information, and the greater the mobility of the image. Correspondingly, the feature extractor can calculate the feature-level attention value (V _i ^F ) by the following formula (3):

其中，E(·)输出的结果为信息熵。Among them, the result of E(·) output is the information entropy.

这样，特征提取器就能有效地度量特征图的迁移能力，知道哪些特征图更适合用于分类，以及哪些特征图对分类有负作用。因此，在特征提取器的最后一个卷积块的卷积层输出的特征图和特征层次注意力值之间建立一个连接，生成的新的特征图，新生成的特征图包含了迁移能力信息。In this way, the feature extractor can effectively measure the transferability of feature maps, and know which feature maps are more suitable for classification, and which feature maps have a negative effect on classification. Therefore, a connection is established between the feature map output by the convolutional layer of the last convolutional block of the feature extractor and the feature-level attention value, and a new feature map is generated. The newly generated feature map contains the transfer ability information.

其中，分类器，对源域图像进行分类，尽可能分出正确的标签。Among them, the classifier classifies the source domain images and separates the correct labels as much as possible.

其中，第一判别器，对特征空间的图像进行分类，尽可能分出图像来自源域，还是目标域。Among them, the first discriminator classifies the images in the feature space, and tries to separate the images from the source domain or the target domain as much as possible.

另外，特征提取器和分类器组成分类检测部分，特征提取器和第一判别器组成域判别部分。特征提取器和第一判别器的优化目标相反，第一判别器试图判断出图像来自于源域，还是目标域，特征提取器试图使第一判别器判断不出图像的来源，对抗体现在两者的优化目标，通过对抗最终使得源域图像和目标域图像通过特征提取器的输出的特征分布相近，即把具有不同分布的源域和目标域，映射到同一个特征空间，寻找某一种度量准则，使其在这个空间上的“距离”尽可能近；进而使得分类器能够同时对源域图像和目标域图像的进行准确的分类。In addition, the feature extractor and the classifier make up the classification detection part, and the feature extractor and the first discriminator make up the domain discrimination part. The optimization goal of the feature extractor and the first discriminator is opposite. The first discriminator tries to determine whether the image comes from the source domain or the target domain. The feature extractor tries to make the first discriminator unable to determine the source of the image. The optimization goal of the author, through confrontation, finally makes the feature distribution of the source domain image and the target domain image output through the feature extractor similar, that is, map the source domain and target domain with different distributions to the same feature space, and find a certain kind of The metric criterion is to make the "distance" in this space as close as possible; thus, the classifier can accurately classify the source domain image and the target domain image at the same time.

接下来对本发明实施例提供的识别模型的损失函数进行介绍。Next, the loss function of the recognition model provided by the embodiment of the present invention is introduced.

其中，损失函数参见如下公式(4)。Among them, the loss function is shown in the following formula (4).

其中，L_S表示浅层损失域损失；L_D表示深层特征域损失；L_E表示熵损失；μ、α和β表示平衡浅层特征域损失、深层特征域损失和熵损失的超参数。

表示带有标签的源域图像的分类损失。Among them, _LS represents the shallow loss domain loss; _LD represents the deep feature domain loss; _LE represents the entropy loss; μ, α and β represent the hyperparameters that balance the shallow feature domain loss, the deep feature domain loss and the entropy loss.

Represents the classification loss for source domain images with labels.

其中，

通过如下公式(5)计算：in,

Calculated by the following formula (5):

其中,L_y(·)表示交叉熵损失函数；G_y(·)表示分类器，输出为预测出的源域中第i个图像属于y_i类别的概率。Among them, _Ly (·) represents the cross-entropy loss function; G _y (·) represents the classifier, and the output is the predicted probability that the i-th image in the source domain belongs to the _yi category.

其中，浅层特征域损失用于使得特征提取器学习到源域与目标域之间迁移性更强的特征，使得特征提取器中的第二判别器无法区分图像是来自目标域，还是源域。浅层特征域损失通过如下公式(6)计算：Among them, the shallow feature domain loss is used to make the feature extractor learn features that are more transferable between the source domain and the target domain, so that the second discriminator in the feature extractor cannot distinguish whether the image is from the target domain or the source domain. . The shallow feature domain loss is calculated by the following formula (6):

其中，G_d(·)表示第二判别器，L_d(·)表示G_d(·)的二分类交叉熵损失，对于源域的图像，

等于1，对于目标域的图像

等于0；F′^S表示第j个卷积块输出的源域的第i个图像的特征图；F′^T来表示表示第j个卷积块输出的目标域的第i个图像的特征图。值得注意的是，L_S表示基于浅层特征进行源域分类的误差。where G _d ( ) represents the second discriminator, L _d ( ) represents the binary cross-entropy loss of G _d ( ), and for the images in the source domain,

equal to 1, for images in the target domain

Equal to 0; F′ ^S represents the feature map of the i-th image in the source domain output by the j-th convolution block; F′ ^T represents the feature map of the i-th image in the target domain output by the j-th convolution block . It is worth noting that _LS represents the error of source domain classification based on shallow features.

需要说明的是，特征提取器输出的特征图的生成经过了池化层，从而导致了浅层特征信息的丢失。另外，考虑到每一张图像的迁移性都是不一样的，对于在特征空间不相似的图像，会对源域的特征向目标域迁移有负作用，从而影响分类器的分类性能。考虑到特征提取器需要得到具有较好的可迁移性的特征，使得第二判别器无法区分图像是来自目标域，还是源域，同时，分类器利用特征提取器输出的特征，可以很好的完成分类任务。如图2所示，本发明实施例在池化层之前设计了浅层特征域损失。It should be noted that the feature map output by the feature extractor is generated through the pooling layer, which leads to the loss of shallow feature information. In addition, considering that the transferability of each image is different, for images that are not similar in the feature space, it will have a negative effect on the transfer of features from the source domain to the target domain, thereby affecting the classification performance of the classifier. Considering that the feature extractor needs to obtain features with good transferability, so that the second discriminator cannot distinguish whether the image is from the target domain or the source domain. Complete the classification task. As shown in Figure 2, the embodiment of the present invention designs a shallow feature domain loss before the pooling layer.

其中，深层特征域损失用于使得特征提取器学习到源域图像与目标域图像之间迁移性更强的图像的深层的特征，通过如下公式(7)计算：Among them, the deep feature domain loss is used to make the feature extractor learn the deep features of the image with stronger migration between the source domain image and the target domain image, and it is calculated by the following formula (7):

其中，g_d(·)表示第一判别器，L_d(·)表示g_d(·)的二分类交叉熵损失。对于源域图像，

等于1，对于目标域图像

等于0。更进一步解释，g_d(·)的输出即是图像属于源域的概率

这样，我们的域损失就包括了浅层特征域损失L_S和深层特征的域损失L_D。where g _d (·) represents the first discriminator, and L _d (·) represents the binary cross-entropy loss of g _d (·). For source domain images,

equal to 1, for target domain images

equal to 0. Further explanation, the output of g _d ( ) is the probability that the image belongs to the source domain

In this way, our domain loss includes the domain loss LS for shallow features and the domain loss _LD for _deep features.

需要说明的是，考虑到每一张图像的迁移性都是不一样的，对于在特征空间不相似的图像，会对源域的知识向目标域迁移有负作用，从而影响分类器的分类性能。如图2所示，本发明实施例在分类之前设计了深层特征域损失。It should be noted that, considering that the transferability of each image is different, for images that are not similar in the feature space, it will have a negative effect on the transfer of knowledge from the source domain to the target domain, thereby affecting the classification performance of the classifier. . As shown in Figure 2, the embodiment of the present invention designs a deep feature domain loss before classification.

其中，L_E通过如下公式(8)计算：Among them, _LE is calculated by the following formula (8):

其中，L_E表示熵损失，用于说明分类器的分类结果的不确定性；V_i ^F表示熵层次注意力值；C表示类别的数目，例如4类；p_i，c表示第i图像对应的预测类别为c的概率。Among them, LE represents the entropy loss, which is used to describe the uncertainty of the classification result of the classifier; V _i ^F represents the entropy level attention value; C represents the number of categories, such as 4 categories; p _{i, c} represent the corresponding image of the _i -th image The probability that the predicted class is c.

需要说明的是，熵层次注意力与特征层次注意力相似，用来描述对图像的信息量损失的关注程度。迁移性较小的图像可能会使得特征提取器学习不能学习到迁移性较强的特征，降低分类器的分类精度，这里，迁移性更强的图像有更大的熵层次注意力值，迁移性较小的图像的熵层次注意力值较小，使得特征提取器更多的关注迁移性更强的特征。对应的，可以通过如下公式(9)计算熵层次注意力值(V_i ^E)：It should be noted that entropy-level attention is similar to feature-level attention, and is used to describe the degree of attention to the loss of information in images. Images with less transferability may prevent the feature extractor from learning more transferable features and reduce the classification accuracy of the classifier. Smaller images have smaller entropy-level attention values, making the feature extractor pay more attention to more transferable features. Correspondingly, the entropy level attention value (V _i ^E ) can be calculated by the following formula (9):

其中,E(·)输出的结果为信息熵。Among them, the result of E(·) output is the information entropy.

应当理解的是，在信息论的熵函数的启示下，熵损失用来减少输出类别概率的不确定性。本发明实施例采用目标图像，则使用熵损失有两个好处，一方面由于图像中的林木和林木间区域有较大的相似性，熵损失可以改善易混淆样本预测的置信度；另一方面，对于相似性很差的影像，如果强制性地提高他们的置信度，会对分类器造成不良影响；因此，将熵层次注意力值的权重赋予到熵损失上，这样基于熵层次注意力的最小熵正则化就使我们的分类器的预测更加可信。熵损失正则化作为损失函数的惩罚项，对损失函数中的某些参数做一些限制。It should be understood that, inspired by the entropy function of information theory, the entropy loss is used to reduce the uncertainty of the output class probability. In the embodiment of the present invention, if the target image is used, the entropy loss has two advantages. On the one hand, since the forest trees in the image and the area between the trees have a large similarity, the entropy loss can improve the confidence of the easily confused sample prediction; on the other hand , for images with poor similarity, if they are forced to increase their confidence, it will have a bad impact on the classifier; therefore, the weight of the entropy-level attention value is assigned to the entropy loss, so that the entropy-level attention-based Minimum entropy regularization makes our classifier's predictions more credible. Entropy loss regularization is used as a penalty term for the loss function, and some restrictions are imposed on some parameters in the loss function.

如图3所示，为本发明实施例提供的一种模型训练的方法。本发明实施例所提供的方法可应用在电子设备上，具体可以应用于服务器或一般计算机上。本实施例中，所述方法具体包括以下步骤：As shown in FIG. 3 , it is a method for model training provided by an embodiment of the present invention. The method provided by the embodiment of the present invention can be applied to an electronic device, and specifically can be applied to a server or a general computer. In this embodiment, the method specifically includes the following steps:

步骤301，基于对不同林木区域分别拍摄得到的遥感图像，得到多个目标图像，多个目标图像包括有标签的源域图像和无标签的目标域图像，目标域图像和源域图像对应的林木区域不同。Step 301, obtaining multiple target images based on the remote sensing images obtained by shooting different forest areas respectively, and the multiple target images include a labeled source domain image and an unlabeled target domain image, and the target domain image and the forest tree corresponding to the source domain image. Regions are different.

根据一种可行的实现方式，对第一传感器拍摄第一林木区域得到的第一遥感图像进行分割，得到多个图像；然后，为这些图像添加标签，得到多个源域图像。当然，在实际应用中，第一传感器通常拍摄了多个第一遥感图像，每个第一遥感图像的处理过程相同。According to a feasible implementation manner, the first remote sensing image obtained by photographing the first forest area by the first sensor is segmented to obtain multiple images; then, tags are added to these images to obtain multiple source domain images. Of course, in practical applications, the first sensor usually captures a plurality of first remote sensing images, and the processing process of each first remote sensing image is the same.

根据一种可行的实现方式，对第二传感器拍摄第二林木区域得到的第二遥感图像进行分割，得到多个图像；无需为这些图像添加标签，得到多个目标域图像。当然，在实际应用中，第二传感器通常拍摄了多个第二遥感图像，每个第二遥感图像的处理过程相同。According to a feasible implementation manner, the second remote sensing image obtained by photographing the second forest area by the second sensor is segmented to obtain multiple images; there is no need to add labels to these images to obtain multiple target domain images. Of course, in practical applications, the second sensor usually captures multiple second remote sensing images, and the processing process of each second remote sensing image is the same.

示例地，第一传感器和第二传感器不同。比如，第一传感器和第二传感器为不同卫星上的传感器。Illustratively, the first sensor and the second sensor are different. For example, the first sensor and the second sensor are sensors on different satellites.

示例地，第一传感器和第二传感器也可以相同，但是采集的时刻不同。For example, the first sensor and the second sensor may also be the same, but the acquisition moments are different.

示例地，第一林木区域和第二林木区域不同，从而实现跨区域的识别。For example, the first forest area and the second forest area are different, so as to realize cross-area identification.

示例地，上述目标图像的尺寸可以为17×17像素。For example, the size of the above target image may be 17×17 pixels.

示例地，第一林木区域和第二林木区域包括属于目标类别的林木比如油棕榈树。当然，油棕榈树仅仅作为示例，并不构成具体限定，关于源域图像和目标域图像可以结合实际需求确定，本发明实施例对此不做具体限定。源域图像为上述源域中的图像，目标域图像为上述目标域中的图像。Illustratively, the first forest area and the second forest area include forest trees belonging to the target category, such as oil palm trees. Of course, the oil palm tree is only an example, and does not constitute a specific limitation. The source domain image and the target domain image may be determined according to actual requirements, which are not specifically limited in this embodiment of the present invention. The source domain image is the image in the above-mentioned source domain, and the target domain image is the image in the above-mentioned target domain.

关于源域和目标域构建的相关内容参见上文，此处不再赘述。For the content related to the construction of the source domain and the target domain, please refer to the above, and will not be repeated here.

步骤302，对于多个目标图像的各图像：将目标图像代入特征提取器进行特征提取处理，得到第一特征图；将第一特征图输入分类器中，得到对目标图像的分类结果；将第一特征图输入第一判别器中，得到目标图像属于源域的第一概率值。Step 302, for each image of a plurality of target images: substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map; inputting the first feature map into the classifier to obtain a classification result for the target image; A feature map is input into the first discriminator to obtain a first probability value that the target image belongs to the source domain.

特征提取器和分类器的详细描述参见上文，此处不再赘述。The detailed description of the feature extractor and the classifier is referred to above, and will not be repeated here.

步骤303，基于多个目标图像各自对应的目标检测分类结果和第一概率值，得到第一损失，其指示了分类结果的不确定度。Step 303 , obtaining a first loss, which indicates the uncertainty of the classification result, based on the target detection classification result and the first probability value corresponding to each of the plurality of target images.

根据一种可行的实现方式，对于多个目标图像的各图像，基于对应的第一概率值计算香农熵，基于香农熵计算注意力值；之后，基于注意力值和多个目标图像各自对应的注意力值和分类结果中各类别的概率值，确定第一损失。According to a feasible implementation manner, for each of the multiple target images, the Shannon entropy is calculated based on the corresponding first probability value, and the attention value is calculated based on the Shannon entropy; The attention value and the probability value of each category in the classification result determine the first loss.

第一损失对应上述熵损失L_E。需要说明的是，分类结果中某一类别相对其他的类别的概率越高，分类结果的不确定度越低；若分类结果中多个类别的概率差异较小，分类结果的不确定度越高。因此，设计第一损失的目的为降低分类器输出的分类结果的不确定性。The first loss corresponds to the entropy loss _LE described above. It should be noted that the higher the probability of a certain category in the classification result relative to other categories, the lower the uncertainty of the classification result; if the difference between the probabilities of multiple categories in the classification result is small, the higher the uncertainty of the classification result. . Therefore, the purpose of designing the first loss is to reduce the uncertainty of the classification result output by the classifier.

步骤304，基于各源域图像具有的标签、对应的分类结果，得到第二损失，其指示了分类器的分类误差。Step 304, based on the labels of each source domain image and the corresponding classification results, obtain a second loss, which indicates the classification error of the classifier.

第二损失对应上述分类损失

需要说明的是，分类损失越小，说明分类器预测出的源域图像的标签的概率越高，分类结果越准确。The second loss corresponds to the above classification loss

It should be noted that the smaller the classification loss, the higher the probability of the label of the source domain image predicted by the classifier, and the more accurate the classification result.

步骤305，基于多个目标图像各自对应的第一概率值，得到第三损失，其指示了基于第一特征图进行源域分类的误差。Step 305 , obtaining a third loss based on the first probability values corresponding to each of the multiple target images, which indicates the error of the source domain classification based on the first feature map.

第三损失对应上述深层特征域损失L_D。需要说明的是，深层特征域损失L_D越小，说明特征提取器针对源域图像和目标域图像各自输出的第一特征图进行源域分类的误差越小。The third loss corresponds to the above-mentioned deep feature domain loss _LD . It should be noted that, the smaller the loss _LD of the deep feature domain, the smaller the error of the source domain classification performed by the feature extractor for the first feature maps respectively output by the source domain image and the target domain image.

步骤306，基于第一损失、第二损失和第三损失，对特征提取器、分类器和第一判别器进行训练。In step 306, the feature extractor, the classifier and the first discriminator are trained based on the first loss, the second loss and the third loss.

根据一种可行的实现方式，特征提取器包括第一提取层、第二提取层、第二判别器；第一提取层用于对目标图像进行浅层特征的提取，得到第二特征图；第二判别器用于基于第二特征图，判断目标图像属于源域图像的第二概率值；第二提取层用于基于第二特征图和第二概率值进行深层特征的提取，得到第一特征图。According to a feasible implementation manner, the feature extractor includes a first extraction layer, a second extraction layer, and a second discriminator; the first extraction layer is used to extract shallow features of the target image to obtain a second feature map; The second discriminator is used to determine the second probability value of the target image belonging to the source domain image based on the second feature map; the second extraction layer is used to extract deep features based on the second feature map and the second probability value to obtain the first feature map. .

在一个例子中，特征提取器包括多个卷积块和池化层，第一提前层包括池化层前的多个卷积块，所述第二提取层包括所述池化层和之后的多个卷积块；单个卷积块包括卷积层、批归一化层、实例归一化层和激活层。详细内容参见上文，此处不再赘述。In one example, the feature extractor includes a plurality of convolutional blocks and pooling layers, the first advance layer includes a plurality of convolutional blocks before the pooling layer, and the second extraction layer includes the pooling layer and subsequent layers Multiple convolutional blocks; a single convolutional block includes convolutional layers, batch normalization layers, instance normalization layers, and activation layers. For details, refer to the above, and will not be repeated here.

进一步地，还包括：基于多个目标图像各自对应的第二概率值，得到第四损失，其指示了基于第二特征图进行源域分类的误差。Further, the method further includes: obtaining a fourth loss based on the second probability values corresponding to each of the plurality of target images, which indicates the error of the source domain classification based on the second feature map.

之后，基于第一损失、第二损失、第三损失和第四损失，对特征提取器、分类器和第一判别器进行训练，从而找到源域图像和目标域图像之间的迁移性较强的特征，构造迁移性较强的特征空间，使得特征提取器将源域图像和目标域图像映射到该特征空间中，进而通过分类器实现目标域图像的分类。After that, based on the first loss, the second loss, the third loss and the fourth loss, the feature extractor, the classifier and the first discriminator are trained, so as to find that the migration between the source domain image and the target domain image is strong , construct a feature space with strong mobility, so that the feature extractor maps the source domain image and the target domain image into the feature space, and then realizes the classification of the target domain image through the classifier.

通过以上技术方案可知，本实施例存在的有益效果是：It can be known from the above technical solutions that the beneficial effects of the present embodiment are:

通过训练特征提取器以使其学习源域图像和目标域图像可共用的特征分布，使得分类器可对目标域图像进行较为准确的分类，实现将源域图像的标签迁移到目标域图像；另外，由于源域图像和目标图像对应的林木区域不同，可实现跨区域的林木检测。By training the feature extractor to learn the feature distribution shared by the source domain image and the target domain image, the classifier can classify the target domain image more accurately, and transfer the label of the source domain image to the target domain image; , since the forest area corresponding to the source image and the target image is different, cross-area forest detection can be realized.

如图4所示，为本发明实施例提供的一种目标检测的方法。本发明实施例所提供的方法可应用在电子设备上，具体可以应用于服务器或一般计算机上。本实施例中，所述方法具体包括以下步骤：As shown in FIG. 4 , it is a target detection method provided by an embodiment of the present invention. The method provided by the embodiment of the present invention can be applied to an electronic device, and specifically can be applied to a server or a general computer. In this embodiment, the method specifically includes the following steps:

步骤401，获取待检测的目标图像。Step 401, acquiring a target image to be detected.

根据一种可行的实施方式，获取第三传感器对第三区域拍摄得到的图像，该图像作为目标图像。这里，目标图像所对应的林木区域和上述多个目标图像所对应的林木区域可以不同。According to a feasible implementation manner, an image captured by the third sensor on the third area is acquired, and the image is used as the target image. Here, the forest tree area corresponding to the target image and the forest tree area corresponding to the above-mentioned multiple target images may be different.

步骤402，对所述目标图像进行分割，确定多个子图。Step 402: Segment the target image to determine multiple sub-images.

根据一种可行的实施方式，通过滑动窗口对目标图像进行有重叠的划分，得到大小满足模型输入要求的多个子图，比如，17×17像素。According to a feasible implementation manner, overlapping divisions are performed on the target image through a sliding window to obtain multiple sub-images whose size meets the input requirements of the model, for example, 17×17 pixels.

步骤403，通过特征提取器和分类器，对所述多个子图分别进行检测分类，确定检测分类结果；所述特征提取器和分类器通过上述任一方法训练得到，所述检测分类结果包括多个目标框和所述多个目标框各自的类别。In step 403, the feature extractor and the classifier are used to detect and classify the multiple sub-graphs respectively, and the detection and classification results are determined; the feature extractor and the classifier are obtained by training any of the above methods, and the detection and classification results include multiple a target box and the respective categories of the plurality of target boxes.

如图5所示，对于多个子图的各图，将该子图依次输入特征提取器和分类器中，得到分类器针对多个子图的检测分类结果，包括各子图的目标框的坐标和类别，这里，各子图的目标框可以有多个。在实际应用中，分类器输出的是概率分布(包括多个类别各自对应的概率值)，将概率分布中最大概率值对应的类别作为检测分类结果中目标框的类别。As shown in Figure 5, for each of the multiple sub-images, the sub-images are input into the feature extractor and the classifier in turn to obtain the detection and classification results of the classifier for the multiple sub-images, including the coordinates of the target frame of each sub-image and the Category, here, there can be multiple target boxes for each subgraph. In practical applications, the output of the classifier is a probability distribution (including probability values corresponding to multiple categories), and the category corresponding to the maximum probability value in the probability distribution is used as the category of the target frame in the detection classification result.

步骤404，对属于相同类别的各目标框进行合并，确定所述目标图像的目标检测结果。Step 404: Merge the target frames belonging to the same category to determine the target detection result of the target image.

根据一种可行的实施方式，采用基于交并比(Intersection-Of-Union,IOU)的准则进行相同类别的目标框的合并，基于IOU的合并方法不需要迭代的步骤，可提高合并效率。比如，如果类别相同的两个目标框的IOU大于等于给定的阈值，将这两个目标框的坐标求平均。在实际应用中，对IOU大于等于给定的阈值的多个目标框进行合并即可；合并后的目标框的计算公式参见如下公式(10)。According to a feasible implementation manner, a criterion based on Intersection-Of-Union (IOU) is used to merge target frames of the same category, and the IOU-based merging method does not require iterative steps, which can improve the merging efficiency. For example, if the IOU of two target boxes with the same category is greater than or equal to a given threshold, the coordinates of the two target boxes are averaged. In practical applications, it is sufficient to merge multiple target frames with an IOU greater than or equal to a given threshold; the calculation formula of the merged target frame is shown in the following formula (10).

其中，(X_lt，Y_lt)代表合并后的目标框的左上角的坐标；(X_rb，Y_rb)代表合并后的目标框的右下角的坐标；n代表与IOU大于阈值的多个目标框的数量；(x_lt，i，y_lt，i)代表IOU大于阈值的多个目标框中的第i个目标框的左上角的坐标；(x_rb，i，y_rb，i)代表IOU大于阈值的多个目标框中的第i个目标框的右下角的坐标。Among them, (X _lt , Y _lt ) represents the coordinates of the upper left corner of the combined target frame; (X _rb , Y _rb ) represents the coordinates of the lower right corner of the combined target frame; n represents multiple targets with IOU greater than the threshold The number of boxes; (x _{lt, i} , y _{lt, i} ) represents the coordinates of the upper left corner of the ith target box in multiple target boxes with IOU greater than the threshold; (x _{rb, i} , y _{rb, i} ) represents the IOU The coordinates of the lower right corner of the ith target box in multiple target boxes greater than the threshold.

在实际应用中，可以仅仅对指定类别比如油棕榈树类别的目标框进行合并。In practical applications, only the target boxes of the specified category, such as the oil palm tree category, can be merged.

对目标图像进行分割，并对分割后的各图像分别进行检测分类，之后合并类型相同的目标框，提高图像的目标检测结果的准确性。The target image is segmented, and each segmented image is detected and classified separately, and then target frames of the same type are combined to improve the accuracy of the target detection result of the image.

基于与本发明方法实施例相同的构思，请参考图6，本发明实施例还提供了一种模型训练的装置，包括：Based on the same concept as the method embodiment of the present invention, please refer to FIG. 6 , the embodiment of the present invention further provides a model training device, including:

图像获取模块601，用于基于对不同林木区域分别拍摄得到的遥感图像，得到多个目标图像，所述多个目标图像包括带标签的源域图像和无标签的目标域图像，所述源域图像和所述目标域图像所对应的林木区域不同；The image acquisition module 601 is used to obtain a plurality of target images based on remote sensing images obtained by shooting different forest areas respectively, and the plurality of target images include a labeled source domain image and an unlabeled target domain image. The image and the forest area corresponding to the target domain image are different;

分类模块602，用于对于所述多个目标图像的各图像：将所述目标图像代入特征提取器进行特征提取处理，得到第一特征图；将所述第一特征图输入分类器中，得到对所述目标图像的分类结果；将所述第一特征图输入第一判别器中，得到所述目标图像属于源域的第一概率值；The classification module 602 is used for each image of the plurality of target images: substitute the target image into a feature extractor to perform feature extraction processing to obtain a first feature map; input the first feature map into the classifier to obtain Classification result of the target image; inputting the first feature map into a first discriminator to obtain a first probability value that the target image belongs to the source domain;

第一损失计算模块603，用于基于所述多个目标图像各自对应的目标检测分类结果和所述第一概率值，得到第一损失，其指示了所述分类结果的不确定度；a first loss calculation module 603, configured to obtain a first loss based on the respective target detection classification results and the first probability value corresponding to the multiple target images, which indicates the uncertainty of the classification result;

第二损失计算模块604，用于基于所述源域的各图像具有的标签、对应的分类结果，得到第二损失，其指示了所述分类器的分类误差；The second loss calculation module 604 is configured to obtain a second loss based on the label and the corresponding classification result of each image in the source domain, which indicates the classification error of the classifier;

第三损失计算模块605，用于基于所述多个目标图像各自对应的第一概率值，得到第三损失，其指示了基于第一特征图进行源域分类的误差；A third loss calculation module 605, configured to obtain a third loss based on the respective first probability values corresponding to the multiple target images, which indicates the error of the source domain classification based on the first feature map;

训练模块606，用于基于所述第一损失、所述第二损失和所述第三损失，对所述特征提取器、分类器和第一判别器进行训练。A training module 606, configured to train the feature extractor, the classifier and the first discriminator based on the first loss, the second loss and the third loss.

根据一种可行的实施方式，所述多个目标图像基于对不同林木区域分别拍摄得到的遥感图像进行分割得到；According to a feasible implementation manner, the multiple target images are obtained by segmenting remote sensing images obtained from different forest areas;

所述各源域图像具有的标签有多种，至少包括所述林木类别、林木间区域；Each source domain image has a variety of labels, including at least the forest tree category and the area between trees;

所述不同林木区域分别包括属于目标林木类别的林木；The different forest tree areas respectively include forest trees belonging to the target forest tree category;

所述目标域图像和所述源域图像来自不同的传感器；和/或，the target domain image and the source domain image are from different sensors; and/or,

所述目标域图像和所述源域图像各自对应的拍摄季节不同。The corresponding shooting seasons of the target domain image and the source domain image are different.

根据一种可行的实施方式，所述特征提取器包括第一提取层、第二提取层、第二判别器；所述第一提取层用于对所述目标图像进行浅层特征的提取，得到第二特征图；所述第二判别器用于基于所述第二特征图，判断所述目标图像属于所述源域的第二概率值；所述第二提取层用于基于所述第二特征图和所述第二概率值进行深层特征的提取，得到第一特征图。According to a feasible implementation manner, the feature extractor includes a first extraction layer, a second extraction layer, and a second discriminator; the first extraction layer is used to extract shallow features on the target image, and obtain the second feature map; the second discriminator is used to determine the second probability value of the target image belonging to the source domain based on the second feature map; the second extraction layer is used to determine the second probability value based on the second feature The deep feature extraction is performed on the map and the second probability value to obtain a first feature map.

在一个例子中，所述特征提取器包括多个卷积块和池化层，所述第一提前层包括池化层前的多个卷积块，所述第二提取层包括所述池化层和之后的多个卷积块；In one example, the feature extractor includes a plurality of convolutional blocks and a pooling layer, the first advance layer includes a plurality of convolutional blocks preceding the pooling layer, and the second extraction layer includes the pooling layer layer and multiple convolutional blocks after;

单个卷积块包括卷积层、批归一化层、实例归一化层和激活层。A single convolutional block includes convolutional layers, batch normalization layers, instance normalization layers, and activation layers.

在一个例子中，所述第二提取层用于基于所述第二概率值计算熵，基于计算得到的熵值确定特征注意力值，基于所述特征注意力值和所述第二特征图进行深层特征的提取，得到第一特征图。In one example, the second extraction layer is configured to calculate entropy based on the second probability value, determine a feature attention value based on the calculated entropy value, and perform a feature attention value based on the feature attention value and the second feature map. Extraction of deep features to obtain a first feature map.

根据一种可行的实施方式，所述装置还包括：第三损失计算模块；其中，According to a feasible implementation manner, the apparatus further includes: a third loss calculation module; wherein,

所述第三损失计算模块，用于基于所述多个目标图像各自对应的第二概率值，得到第四损失，其指示了基于第二特征图进行源域分类的误差。The third loss calculation module is configured to obtain a fourth loss based on the respective second probability values corresponding to the multiple target images, which indicates the error of the source domain classification based on the second feature map.

所述训练模块606，用于基于所述第一损失、所述第二损失、所述第三损失和第四损失，对所述特征提取器、分类器和第一判别器进行训练。The training module 606 is configured to train the feature extractor, the classifier and the first discriminator based on the first loss, the second loss, the third loss and the fourth loss.

根据一种可行的实施方式，所述第一损失计算模块603，包括：注意力计算单元和损失计算单元；其中，According to a feasible implementation manner, the first loss calculation module 603 includes: an attention calculation unit and a loss calculation unit; wherein,

所述注意力计算单元，用于对于所述多个目标图像的各图像，基于对应的第一概率值计算熵，基于计算得到的熵值得到熵注意力值，所述熵注意力值指示了对所述目标图像的注意程度；The attention calculation unit is configured to, for each of the multiple target images, calculate entropy based on the corresponding first probability value, and obtain an entropy attention value based on the calculated entropy value, where the entropy attention value indicates a the degree of attention to the target image;

所述损失计算单元，用于基于所述多个目标图像各自对应的熵注意力值和所述分类结果中各类别的概率值，确定第一损失。The loss calculation unit is configured to determine the first loss based on the respective corresponding entropy attention values of the multiple target images and the probability values of each category in the classification result.

基于与本发明方法实施例相同的构思，请参考图7，本发明实施例还提供了一种目标检测的装置，包括：Based on the same concept as the method embodiment of the present invention, please refer to FIG. 7 , the embodiment of the present invention further provides a device for target detection, including:

图像获取模块701，用于获取待检测的目标图像；An image acquisition module 701, configured to acquire a target image to be detected;

分割模块702，用于对所述目标图像进行分割，确定多个子图；A segmentation module 702, configured to segment the target image to determine multiple sub-images;

分类模块703，用于通过特征提取器和分类器，对所述多个子图分别进行检测分类，确定检测分类结果；所述特征提取器和分类器通过上述第一方面中任一所述的方法训练得到，所述检测分类结果包括多个目标框和所述多个目标框各自的类别；The classification module 703 is configured to detect and classify the plurality of sub-images respectively through the feature extractor and the classifier, and determine the detection and classification result; the feature extractor and the classifier use any one of the methods described in the first aspect above. Obtained through training, the detection and classification result includes multiple target frames and respective categories of the multiple target frames;

合并模块704，用于对属于相同类别的各目标框进行合并，确定所述目标图像的目标检测结果。The merging module 704 is used for merging each target frame belonging to the same category to determine the target detection result of the target image.

图8是本发明实施例提供的一种电子设备的结构示意图。在硬件层面，该电子设备包括处理器801以及存储有执行指令的存储器802，可选地还包括内部总线803及网络接口804。其中，存储器802可能包含内存8021，例如高速随机存取存储器(Random-AccessMemory，RAM)，也可能还包括非易失性存储器8022(non-volatile memory)，例如至少1个磁盘存储器等；处理器801、网络接口804和存储器802可以通过内部总线803相互连接，该内部总线803可以是ISA(Industry Standard Architecture，工业标准体系结构)总线、PCI(Peripheral Component Interconnect，外设部件互连标准)总线或EISA(ExtendedIndustry Standard Architecture，扩展工业标准结构)总线等；内部总线803可以分为地址总线、数据总线、控制总线等，为便于表示，图8中仅用一个双向箭头表示，但并不表示仅有一根总线或一种类型的总线。当然，该电子设备还可能包括其他业务所需要的硬件。当处理器801执行存储器802存储的执行指令时，处理器801执行本发明任意一个实施例中的方法，并至少用于执行如图3或图4所示的方法。FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. At the hardware level, the electronic device includes a processor 801 and a memory 802 storing execution instructions, and optionally an internal bus 803 and a network interface 804 . The memory 802 may include a memory 8021, such as a high-speed random-access memory (Random-Access Memory, RAM), and may also include a non-volatile memory 8022 (non-volatile memory), such as at least one disk memory, etc.; the processor 801, the network interface 804 and the memory 802 can be connected to each other through an internal bus 803, and the internal bus 803 can be an ISA (Industry Standard Architecture, industry standard architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or EISA (Extended Industry Standard Architecture, Extended Industry Standard Architecture) bus, etc.; the internal bus 803 can be divided into address bus, data bus, control bus, etc. For convenience, only one bidirectional arrow is used in FIG. 8, but it does not mean that there is only one Root bus or a type of bus. Of course, the electronic equipment may also include hardware required for other services. When the processor 801 executes the execution instructions stored in the memory 802, the processor 801 executes the method in any one of the embodiments of the present invention, and is at least configured to execute the method shown in FIG. 3 or FIG. 4 .

在一种可能实现的方式中，处理器从非易失性存储器中读取对应的执行指令到内存中然后运行，也可从其它设备上获取相应的执行指令，以在逻辑层面上形成一种模型训练的装置或目标检测的装置。处理器执行存储器所存放的执行指令，以通过执行的执行指令实现本发明任实施例中提供的一种模型训练的方法或目标检测的方法。In a possible implementation manner, the processor reads the corresponding execution instruction from the non-volatile memory into the memory and then executes it, and also obtains the corresponding execution instruction from other devices, so as to form a logic level A device for model training or a device for object detection. The processor executes the execution instructions stored in the memory, so as to implement a model training method or a target detection method provided in any embodiment of the present invention through the executed execution instructions.

处理器可能是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器，包括中央处理器(Central Processing Unit，CPU)、网络处理器(Network Processor，NP)等；还可以是数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field－Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。A processor may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software. The above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Various methods, steps, and logical block diagrams disclosed in the embodiments of the present invention can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

本发明实施例还提供了一种计算机可读存储介质，包括执行指令，当电子设备的处理器执行执行指令时，所述处理器执行本发明任意一个实施例中提供的方法。该电子设备具体可以是如图8所示的电子设备；执行指令是一种模型训练的装置或目标检测的装置所对应计算机程序。Embodiments of the present invention further provide a computer-readable storage medium, including execution instructions. When a processor of an electronic device executes the execution instructions, the processor executes the method provided in any one of the embodiments of the present invention. Specifically, the electronic device may be the electronic device shown in FIG. 8 ; the execution instruction is a computer program corresponding to an apparatus for model training or an apparatus for target detection.

本领域内的技术人员应明白，本发明的实施例可提供为方法或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例，或软件和硬件相结合的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method or a computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware.

本发明中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment of the present invention is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for related parts.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture, or device that includes the element.

以上所述仅为本发明的实施例而已，并不用于限制本发明。对于本领域技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本发明的权利要求范围之内。The above descriptions are merely embodiments of the present invention, and are not intended to limit the present invention. Various modifications and variations of the present invention are possible for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the scope of the claims of the present invention.

Claims

1. A method of model training, comprising:

obtaining a plurality of target images based on remote sensing images obtained by respectively shooting different forest regions, wherein the target images comprise source domain images with labels and target domain images without labels, and the forest regions corresponding to the source domain images and the target domain images are different;

for each image of the plurality of target images:

substituting the target image into a feature extractor to perform feature extraction processing to obtain a first feature map;

inputting the first feature map into a classifier to obtain a classification result of the target image;

inputting the first feature map into a first discriminator to obtain a first probability value of the target image belonging to a source domain;

obtaining a first loss indicating an uncertainty of the classification result based on the target detection classification result and the first probability value corresponding to each of the plurality of target images;

obtaining a second loss which indicates the classification error of the classifier based on the label of each source domain image and the corresponding classification result;

obtaining a third loss indicating an error of source domain classification based on the first feature map based on the first probability values corresponding to the target images respectively;

training the feature extractor, classifier, and first discriminator based on the first loss, the second loss, and the third loss.

2. The method according to claim 1, wherein the plurality of target images are obtained by segmenting based on remote sensing images respectively captured in different forest regions;

the source domain images have various labels, and at least comprise the forest type and the forest region;

the different forest regions respectively comprise forests belonging to the target forest category;

the target domain image and the source domain image are from different sensors; and/or the presence of a gas in the atmosphere,

the shooting seasons corresponding to the target domain image and the source domain image are different.

3. The method of claim 1, wherein the feature extractor comprises a first extraction layer, a second discriminator;

the first extraction layer is used for extracting shallow features of the target image to obtain a second feature map;

the second discriminator is used for judging a second probability value of the target image belonging to the source domain based on the second feature map;

the second extraction layer is used for extracting deep features based on the second feature map and the second probability value to obtain a first feature map.

4. The method of claim 3, wherein the feature extractor comprises a plurality of volume blocks and a pooling layer, the first extraction layer comprises a plurality of volume blocks before the pooling layer, and the second extraction layer comprises the pooling layer and a plurality of volume blocks after;

a single volume block includes a volume layer, a batch normalization layer, an instance normalization layer, and an activation layer.

5. The method of claim 3, wherein the second extraction layer is configured to calculate entropy based on the second probability value, determine a feature attention value based on the calculated entropy value, and extract deep features based on the feature attention value and the second feature map to obtain the first feature map.

6. The method of claim 3, further comprising:

obtaining a fourth loss based on the second probability values corresponding to the target images respectively, wherein the fourth loss indicates an error of source domain classification based on a second feature map;

the training the feature extractor, the classifier, and the first discriminator based on the first loss, the second loss, and the third loss includes:

training the feature extractor, classifier, and first discriminator based on the first loss, the second loss, the third loss, and a fourth loss.

7. The method of claim 1, wherein determining a first loss based on the target detection classification result and the first probability value for each of the plurality of target images comprises:

for each image of the plurality of target images, calculating entropy based on the corresponding first probability value, and deriving an entropy attention value based on the calculated entropy value, the entropy attention value indicating a degree of attention to the target image;

and determining a first loss based on the entropy attention values corresponding to the target images and the probability values of the categories in the classification result.

8. A method of target detection, comprising:

acquiring a target image to be detected;

segmenting the target image and determining a plurality of sub-images;

respectively carrying out detection classification on the multiple subgraphs through a feature extractor and a classifier, and determining a detection classification result; the feature extractor and the classifier are trained by the method of any one of claims 1 to 7, and the detection classification result comprises a plurality of target frames and respective categories of the target frames;

and merging the target frames belonging to the same category, and determining a target detection result of the target image.

9. A computer-readable storage medium comprising executable instructions that, when executed by a processor of an electronic device, cause the processor to perform the method of any of claims 1 to 7, or the method of claim 8.

10. A computing device comprising a processor and a memory storing execution instructions, the processor performing the method of any of claims 1 to 7, or the method of claim 8, when the processor executes the execution instructions stored by the memory.