CN112016592B

CN112016592B - Domain adaptive semantic segmentation method and device based on cross domain category perception

Info

Publication number: CN112016592B
Application number: CN202010773728.3A
Authority: CN
Inventors: 李仕仁; 王金桥; 朱贵波; 胡建国; 张海; 赵朝阳; 林格; 谭大伦
Original assignee: Nexwise Intelligence China Ltd
Current assignee: Nexwise Intelligence China Ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2024-01-26
Anticipated expiration: 2040-08-04
Also published as: CN112016592A

Abstract

The embodiment of the invention provides a field-adaptive semantic segmentation method and device based on cross field category perception, wherein the method comprises the following steps: after converting the source image style into the target image style, respectively extracting and classifying the characteristics; inputting the feature map and the classification score map into a cross domain class perception module; respectively adjusting the category centers of the feature graphs through the cross domain category center generators of the two cross domain category sensors to enable the category centers of the feature graphs to be close; and adjusting the classified fuzzy feature points of the feature images through the classified attention module respectively to obtain a first attention feature image and a second attention feature image so as to perform semantic segmentation. The embodiment model focuses on the class center of the data feature of the other field when extracting the feature of the certain field, and combines the attention mechanism to adjust the pixel point feature with fuzzy classification in the two fields, so that the class centers of the same class of features in different fields are consistent, the difference of feature distribution is reduced, and the field adaptation is realized.

Description

Domain adaptive semantic segmentation method and device based on cross domain category perception

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a field-adaptive semantic segmentation method and device based on cross field category perception.

Background

Labeling of semantically partitioned data tags requires a significant amount of manual effort. Thus, the real data set for semantic segmentation typically contains only a small number of samples, but this suppresses generalization of the model to various real cases. The common solution is an unsupervised semantic segmentation method, i.e. a model trained based on computer synthesized data sets is used for the data sets of the same kind of real scenes. In order to reduce damage to the actual feature information, a domain adaptation method is required to reduce differences in the feature spatial distribution of the images of the data sets in different domains. Traditional domain adaptation methods typically consider what way to migrate knowledge of the computer synthesis domain to the real scene, thus achieving domain adaptation, without focusing on what knowledge of the computer is migrated, in short, only "how to adapt" and not "what to implement adaptation".

There is some similarity in the image content of different fields, for example, the categories within the picture are approximately the same. Thus, feature spaces of the same class of different domain datasets extracted with the same model should be similar, as should class centers. However, there is often a difference in the feature distribution of the same class of data sets of real scenes and computer synthesized scenes. Therefore, how to realize the domain adaptation by reducing the difference of the feature distribution of different domains is a problem to be solved.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the invention provides a field-adaptive semantic segmentation method and device based on cross field category perception.

In a first aspect, an embodiment of the present invention provides a domain adaptive semantic segmentation method based on cross domain category awareness, where the method includes: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.

Further, the adjusting, by the cross domain class center generator of the two cross domain class perceptrons, class centers of the first feature map and the second feature map, specifically includes: performing inner product operation on the first classification score map and the second feature map to obtain a class center after the first feature map is adjusted; and performing inner product operation on the second classification score graph and the first feature graph to obtain a classification center after the second feature graph is adjusted.

Further, the class center after the adjustment of the first feature map is expressed as:

the class center after the second feature map is adjusted is expressed as:

wherein,the class center representing the ith class of the source data, H representing the feature height, W representing the feature width, j representing the number of pixels, G _c1 (F ₁ ) Representing the first classification score graph, [ G ] _c1 (F ₁ )] ^i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] ₂ ] ^j Representing a feature distribution of a j-th pixel in the second feature map; />The class center, G, representing the ith class of the target data _c2 (F ₂ ) Representing the second classification score map, [ G ] _c2 (F ₂ )] ^i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] ₁ ] ^j Representing the feature distribution of the jth pixel in the first feature map.

Further, the step of performing distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to obtain a first attention feature map and a second attention feature map respectively, which specifically includes: taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center after the source data adjustment to obtain a first class attention feature; carrying out channel addition on the first category attention feature and the first feature map to obtain the first attention feature map; taking the second classification score graph as an attention map, and performing inner product operation on the second classification score graph and the class center after the target data adjustment to obtain a second class attention characteristic; and carrying out channel addition on the second category attention feature and the second feature map to obtain the second attention feature map.

Further, the first attention profile is expressed as:

Wherein,the first attention profile representing the jth pixel of the kth channel of the source image, C ₁ Representing the number of categories of the source image, i representing the category number, G _c1 (F ₁ ) Representing the first classification score graph, [ G ] _c1 (F ₁ )] ^i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; />Representing the class center of a kth pixel of a kth channel of the source image;

the second attention profile is expressed as:

wherein,the second attention profile representing the jth pixel of the kth channel of the target image, C ₂ The number of categories representing the target image, i representing the category number, G _c2 (F ₂ ) Representing the second classification score map, [ G ] _c2 (F ₂ )] ^i，j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; />Representing the class center of the kth channel jth pixel of the target image.

Further, the method further comprises: the first attention profile and the second attention profile are trimmed with a 1 x 1 convolution layer.

Further, before the processing the source adapted image sequentially through the first feature extraction network and the first classifier, the method further comprises: channel compressing the source adaptation image; before the target image is sequentially processed through the second feature extraction network and the second classifier, the method further includes: and carrying out channel compression on the target image.

In a second aspect, an embodiment of the present invention provides a domain-adaptive semantic segmentation apparatus based on cross domain category awareness, where the apparatus includes: the preprocessing module is used for: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; the feature classification module is used for: processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; the feature map adjusting module is used for: inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; the semantic segmentation module is used for: and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.

In a third aspect, an embodiment of the invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as provided in the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as provided by the first aspect.

According to the domain adaptation semantic segmentation method and device based on cross domain class perception, the cross domain class perception module comprising the cross domain class center generator and the class attention module is arranged, so that when a certain domain feature is extracted by a model, the class center of the data feature of the other domain is focused, the attention mechanism is combined, the pixel point feature with fuzzy classification in the two domains is adjusted, the class centers of the same class features in different domains are consistent, the difference of feature distribution is reduced, and domain adaptation is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a domain-adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a domain-adaptive semantic segmentation method based on cross domain class awareness according to an embodiment of the present invention;

FIG. 3 is a schematic process flow diagram of a domain-adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a processing procedure of a cross domain class awareness module in a domain adaptive semantic segmentation method based on cross domain class awareness according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a domain-adaptive semantic segmentation device based on cross domain class awareness according to an embodiment of the present invention;

fig. 6 illustrates a physical structure diagram of an electronic device.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a flowchart of a domain adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention. As shown in fig. 1, the method includes:

step 101, converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image and the label data of the source image are consistent.

Semantic segmentation is a typical computer vision problem that involves taking some raw data (e.g., planar images) as input and converting them into a mask with highlighted regions of interest. Many people use the term full pixel semantic segmentation (full-pixel semantic segmentation), where each pixel in an image is assigned a class ID according to the object of interest to which it belongs.

The domain adaptation semantic segmentation method based on the cross domain category perception (also can be called as a domain adaptation semantic segmentation method based on the cross domain category perception countermeasure learning) provided by the embodiment of the invention is mainly used for domain adaptation of a model between a computer synthesized data set and a real scene data set, and aims to solve the segmentation problem of a target data set under the condition of no tag data.

In the embodiment of the invention, the source data set is a data set formed by images with labels, wherein the images are called source images; the target data set is a data set composed of images without labels, wherein the images are called target images. According to the embodiment of the invention, on one hand, the data set with the label can be used for assisting in realizing the accurate semantic segmentation of the data set without the label, and the knowledge learned from the target data set can be migrated into the model training of the source data set, so that the category centers of the source data set and the target data set are mutually close to each other, and more accurate semantic segmentation is realized on the images in the source data set and the target data set.

Because the processing procedures of the source image in the source data set and the target image in the target data set are similar in the embodiment of the invention, the source image and the target image need to be unified in style in order to achieve the approach of the characteristic image category centers of the source image and the target image. In addition, as one of the purposes is to train the semantic segmentation model of the target image by utilizing the source image, the style of the source image in the source data set is converted into the style of the target image in the target data set through the style migration network, so that the source adaptation image is obtained, and the style conversion can be realized by utilizing the style migration network; wherein the source adaptation image and the label data of the source image are consistent.

102, processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; and processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image.

After the style is unified, carrying out feature extraction on the source adaptation image through a first feature extraction network to obtain a first feature map; and then, inputting the first feature map into a first classifier for processing to obtain a first classification score map. The first classification score map includes a classification score for each pixel of the source image. And extracting the characteristics of the target image through a second characteristic extraction network to obtain a second characteristic image; and then, inputting the second characteristic diagram into a second classifier for processing to obtain a second classification score diagram. The second class score map includes class scores for respective pixels of the target image.

Step 103, inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; and the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that the first attention feature map and the second attention feature map are respectively obtained.

The embodiment of the invention provides a method for training a model by adopting a labeled source data set and a target data set without labels. Because the source data set has the tag data, a model which can be used for classifying the characteristics of the source data can be trained, and because the category characteristics of the same kind of target data have certain differences with the category characteristics of the source data, the model trained by the source data has poor characteristic classification capability on the target data. Therefore, the embodiment of the invention provides a module for cross domain category perception (cross domain category perception module), which can enable the category center of the target data of the same category to be approximately the same as the category center of the source data, so that a model learned by the supervised source data is applicable to the classification of the target data. The feature distribution of the same kind in different fields can have some differences, so the embodiment of the invention makes the model cross-sense the feature distribution of the opposite field when extracting the features, thereby the category centers of the different fields are close to each other, and finally the feature distribution of the same category is consistent.

Specifically, inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the step of adjusting the category centers of the first feature map and the second feature map through the cross domain category center generators of the two cross domain category perceptrons respectively means that the category center of the first feature map is adjusted through one of the cross domain category center generators of the two cross domain category perceptrons, and the category center of the second feature map is adjusted through the other of the cross domain category center generators of the two cross domain category perceptrons. In the class score map, the class scores of some pixels are relatively close, i.e., the uncertainty of the classifier on which class the pixels should be classified into is relatively high, and thus, the misclassification is more likely to be caused. These points, which are more likely to be misclassified, need to be given more attention and should be emphasized in the subsequent processing. And respectively carrying out distribution adjustment on the classification fuzzy characteristic points of the first characteristic map and the second characteristic map through the category attention module according to the adjusted category center by using an attention mechanism to respectively obtain a first attention characteristic map and a second attention characteristic map. The step of performing distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention modules respectively refers to performing distribution adjustment on the classification fuzzy feature points of the first feature map through one of the classification attention modules in the two cross domain type perceptrons and performing distribution adjustment on the classification fuzzy feature points of the second feature map through the other classification attention module in the two cross domain type perceptrons.

And 104, carrying out semantic segmentation on the source image according to the first attention feature map and carrying out semantic segmentation on the target image according to the second attention feature map.

And respectively carrying out semantic segmentation on the source image and the target image according to the first attention characteristic diagram and the second attention characteristic diagram. The first attention feature map and the second attention feature map are still feature maps in nature, and the prior art method for performing semantic segmentation according to the feature maps can be used for performing semantic segmentation according to the first attention feature map to obtain a source image segmentation result, and performing semantic segmentation according to the second attention feature map to obtain a target image segmentation result.

According to the embodiment of the invention, the cross domain class perception module comprising the cross domain class center generator and the class attention module is arranged, so that when a model extracts a certain domain feature, the class center of the data feature of the other domain is focused, and the attention mechanism is combined, so that the classification fuzzy pixel point features in the two domains are adjusted, the class centers of the same kind of features in different domains are consistent, the difference of feature distribution is reduced, and the domain adaptation is realized.

Fig. 2 is a schematic diagram of a domain adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention. As shown in fig. 2, the style of the source dataset image is first converted to the style of the target dataset image through a style migration network. The label data of the image after style migration is consistent with the label data of the source image, which is called a source adapted image. The resulting source adaptation image a will then be described _s→t Input to a feature extraction network G _f1 (first feature extraction network) performing feature extraction to obtain a first feature map F _s→t (use F in FIGS. 3 and 4) ₁ Representation) and then through classifier G _c1 (first)A classifier) to obtain a first classification score graph G _c1 (F _s→t ) (use G in FIGS. 3 and 4) _c1 (F ₁ ) A representation); image a of the object _t Input to a feature extraction network G _f2 (second feature extraction network) performing feature extraction to obtain a second feature map F _t (use F in FIGS. 3 and 4) ₂ Representation) and then through classifier G _c2 (second classifier) obtaining a second classification score graph G _c2 (F _t ) (use G in FIGS. 3 and 4) _c2 (F ₂ ) Representation). And respectively inputting the obtained characteristic diagram and the classification score diagram into a constructed cross domain class perception module CDCAM. The first feature map, the first classification score map, the second feature map and the second classification score map are input to a cross domain class perception module to respectively obtain a first attention feature map Z _s→t (Z is used in FIGS. 3 and 4) ₁ Representation) and a second attention profile Z _t (Z is used in FIG. 3) ₂ Representation).

The main task of the cross domain class perception module is to perceive each other according to the class score diagrams of the source data and the target data and the characteristics extracted by the opposite party, so as to promote the class centers of the two domains to adapt to each other. Specifically, the module can enable the characteristics of the two fields extracted by the model to sense the category center of the data characteristics of the other party, so that the category center of the characteristics of the two data fields is close to the other party. It can be seen that the embodiment of the invention also transfers the knowledge learned from the target data set to the model training of the source data set, so that the model can pay attention to the category distribution of the target data when extracting the source data characteristics, thereby improving the robustness of the model. And finally, inputting the image features of the source data set and the image features of the target data set processed by the cross field category perception module into a discriminator D for discrimination. The discriminator is used for discriminating the classification rationality of the image features of the source data set and the image features of the target data set processed by the cross domain class perception module. The feature extraction network and the cross field category perception module act as generators, and the spatial distribution of the generated feature images needs to be consistent, so that the difference between the feature extraction network and the cross field category perception module cannot be identified by the discriminator. Of course, the arbiter does not have to be a module.

Fig. 3 is a schematic process flow diagram of a domain-adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention. Fig. 4 is a schematic diagram of a processing procedure of a cross domain class awareness module in a domain adaptive semantic segmentation method based on cross domain class awareness according to an embodiment of the present invention. As shown in fig. 3, the features on the upper and lower sides respectively represent the feature F output by the feature extraction network and the class score graph G output by the classifier in two data fields _c (F) A. The invention relates to a method for producing a fibre-reinforced plastic composite To reduce the computational effort, the features may be first channel compressed, and a 1 x 1 convolutional layer may be used for channel compression. The compressed features and score maps of the same domain and features of the perceived domain are then input to a cross domain class perceptron (CrossDomain ClassAware Block, CDCAB for short). It can be seen from fig. 3 that the output of two fields of the CDCAM module requires attention to the characteristic information of the data of the other field, which is the origin of the cross-field class awareness module name. While CDCAB consists mainly of two parts, a cross-domain class center generator (CrossDomain Class Center Block) and a class attention module (ClassAttention Block), both modules can be represented by GCDCCB () and GCAB () functions, respectively, as shown in fig. 3. The output of the CDCAB module in the last two fields can be expressed by the following equations, respectively.

In FIG. 3, N, H, W each represents the number of channels, the feature height and the feature width of the feature map, A ₁ Representation of F ₁ Feature map after channel compression, A ₂ Representation of F ₂ And carrying out a characteristic diagram after channel compression. C represents the number of categories. In fig. 4, N' represents the number of channels after compression.

Further, based on the above embodiment, the adjusting, by the cross domain class center generator of the two cross domain class perceptrons, class centers of the first feature map and the second feature map, specifically includes: performing inner product operation on the first classification score map and the second feature map to obtain a class center after the first feature map is adjusted; and performing inner product operation on the second classification score graph and the first feature graph to obtain a classification center after the second feature graph is adjusted.

As shown in fig. 3, an inner product operation is performed on the first classification score map and the second feature map, so as to obtain a classification center after the first feature map is adjusted. Similarly, an inner product operation is performed on the second classification score map and the first feature map, so as to obtain a classification center (not shown in fig. 3) after the second feature map is adjusted.

Of the multiple categories within the semantic segmentation final predictor graph, the category center of the ith category may be represented by the following formula:

wherein F is _j ∈R ^C×H×W Characteristic diagram representing jth pixel point, y _j ∈R ^1×HW Is the true prediction result, [ y ] _j ＝i]The method indicates that whether the true prediction result of the j-th pixel is of the i-th class is judged, if yes, the value is 1, and if not, the value is 0. Therefore, the cross domain category perception module will adjust the category center of the current domain feature according to this formula in combination with the feature information of the perceived domain so that the category center can be close to the category center of the perceived domain.

Since the tag information is not directly provided in the target data set, but the feature information of another domain is intended to adjust the class center of the current domain feature, the initial classification score map of the perceived domain is used as rough tag information, F _j And taking the compressed feature image as a feature image of the pixel points in the feature image of the perceived field. Moreover, the source data set and the target data set are actually perceived by each other, and therefore, have the following expression:

the class center after the first feature map is adjusted is expressed as:

the class center after the second feature map is adjusted is expressed as:

wherein,the class center representing the ith class of the source data, H representing the feature height, W representing the feature width, j representing the number of pixels, G _c1 (F ₁ ) Representing the first classification score graph, [ G ] _c1 (F ₁ )] ^i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] ₂ ] ^j Representing a feature distribution of a j-th pixel in the second feature map; />The class center, G, representing the ith class of the target data _c2 (F ₂ ) Representing the second classification score map, [ G ] _c2 (F ₂ )] ^i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] ₁ ] ^j Representing the feature distribution of the jth pixel in the first feature map. G _c1 (F ₁ )∈R ^c1×HW ，G _c2 (F ₂ )∈R ^c2×HW ，A ₁ 、A ₂ ∈R ^HW×N′ Every category center->

The construction of the cross-domain class center generator has two advantages: first, a feature map of another domain data is added to this module, so that each feature center can understand global information of another domain data. And secondly, the class center obtained by the module can coordinate the consistency between each pixel point and class information, so that the class center in the current field can be finely adjusted through the operation, and the obtained class center is more compatible with the class center in the other field. After the cross domain sensing, some classification and fuzzy characteristic points are separated more, so that the recognition is better; meanwhile, centers of the same category in different fields are closer, so that the segmentation of images in different fields can be finished by the same model.

Further, based on the foregoing embodiment, the performing, by the category attention module, distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map to obtain a first attention feature map and a second attention feature map, respectively, includes: taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center after the source data adjustment to obtain a first class attention feature; carrying out channel addition on the first category attention feature and the first feature map to obtain the first attention feature map; taking the second classification score graph as an attention map, and performing inner product operation on the second classification score graph and the class center after the target data adjustment to obtain a second class attention characteristic; and carrying out channel addition on the second category attention feature and the second feature map to obtain the second attention feature map.

In different fields, the feature distribution of some categories is similar, so that all feature points in different fields do not need to be adjusted. But rather focus on those feature points to which the category is more ambiguous so that they can be explicitly categorized. Inspired by the attention mechanism, a category attention module (Class Attention Block) is constructed in an embodiment of the invention. For the class score map obtained by the current field, the class scores of some pixels are relatively close, i.e. the uncertainty of the classifier on which class the pixels should be classified into is relatively high, so that the misclassification is more likely to be caused. These points, which are more likely to be misclassified, need to be given more attention and should be emphasized in the subsequent processing. Therefore, by using the thought of the attention mechanism, the category attention feature is obtained by taking the category score map of the current field as an attention map according to the adjusted category center, and then the category attention feature is added with the input through a channel.

As shown in fig. 3, the performing, by the category attention module, distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map to obtain a first attention feature map and a second attention feature map, respectively, specifically includes: taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center after the source data adjustment to obtain a first class attention feature; carrying out channel addition on the first category attention feature and the first feature map to obtain the first attention feature map; taking the second classification score graph as an attention map, and performing inner product operation on the second classification score graph and the class center after the target data adjustment to obtain a second class attention characteristic; and carrying out channel addition on the second category attention feature and the second feature map to obtain the second attention feature map.

The cross domain category center F is already available in the description of the above embodiments _class ∈R ^c×N C represents the number of categories, and the category score graph G _F ∈R ^C×H×W The class score graph of the current domain is deformed to change its dimension into c×hw. Finally, a category attention feature map Z is obtained _a ∈R ^N′×HW Then deforming it to obtain Z _a ∈R ^N′×H×W . Wherein:

the first attention profile is expressed as:

the second attention profile is expressed as:

After the first attention characteristic diagram and the second attention characteristic diagram are obtained, a convolution layer with the size of 1 multiplied by 1 can be adopted to finely adjust the output attention characteristic diagram, so that the result is more accurate.

Therefore, the domain adaptive semantic segmentation method based on the cross domain class perception designed by the embodiment of the invention can adjust the class center of the current domain according to the characteristic content of the perceived domain, so that the trained model can adjust the characteristic information of the pixel points with fuzzy class comparison in the classified score graph. Finally, the class center of the perceived feature is close to the class center of the perceived field, and the task of field adaptation is completed more excellently.

Fig. 5 is a schematic structural diagram of a domain-adaptive semantic segmentation device based on cross domain category awareness according to an embodiment of the present invention. As shown in fig. 5, the apparatus includes a preprocessing module 10, a feature classification module 20, a feature map adjustment module 30, and a semantic segmentation module 40, where:

the preprocessing module 10 is used for: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; the feature classification module 20 is configured to: processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; the feature map adjustment module 30 is configured to: inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; the semantic segmentation module 40 is configured to: and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.

The device provided in the embodiment of the present invention is used in the above method, and specific functions may refer to the above method flow, which is not described herein again.

Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a domain-adaptive semantic segmentation method based on cross-domain class awareness, the method comprising: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, embodiments of the present invention further provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the domain-adaptive semantic segmentation method based on cross domain category awareness provided by the above method embodiments, the method comprising: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.

In yet another aspect, embodiments of the present invention further provide a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the domain-adaptive semantic segmentation method based on cross domain category awareness provided by the above embodiments, the method comprising: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The domain adaptive semantic segmentation method based on cross domain category perception is characterized by comprising the following steps of:

converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image;

processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image;

Inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained;

performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map; the adjusting, by the cross domain class center generators of the two cross domain class perceptrons, class centers of the first feature map and the second feature map, includes:

Performing inner product operation on the first classification score map and the second feature map to obtain a class center after the first feature map is adjusted;

and performing inner product operation on the second classification score graph and the first feature graph to obtain a classification center after the second feature graph is adjusted.

2. The domain-adaptive semantic segmentation method based on cross domain class awareness according to claim 1, wherein the class center after the first feature map adjustment is expressed as:

the class center after the second feature map is adjusted is expressed as:

wherein,the class center representing the ith class of the source data, H representing the feature height, W representing the feature width, j representing the number of pixels, G _c1 (F ₁ ) Representing the first classification scoreFigure of number, [ G ] _c1 (F ₁ )] ^i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] ₂ ] ^j Representing a feature distribution of a j-th pixel in the second feature map; />The class center, G, representing the ith class of the target data _c2 (F ₂ ) Representing the second classification score map, [ G ] _c2 (F ₂ )] ^i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] ₁ ] ^j Representing the feature distribution of the jth pixel in the first feature map.

3. The method for domain-adaptive semantic segmentation based on cross domain class awareness according to claim 1, wherein the performing, by the class attention module, distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map to obtain a first attention feature map and a second attention feature map, respectively, specifically includes:

taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center after the source data adjustment to obtain a first class attention feature; carrying out channel addition on the first category attention feature and the first feature map to obtain the first attention feature map;

taking the second classification score graph as an attention map, and performing inner product operation on the second classification score graph and the class center after the target data adjustment to obtain a second class attention characteristic; and carrying out channel addition on the second category attention feature and the second feature map to obtain the second attention feature map.

4. A method of domain-adaptive semantic segmentation based on cross-domain category awareness according to claim 3, wherein the first attention profile is represented as:

the second attention profile is expressed as:

wherein,the second attention profile representing the jth pixel of the kth channel of the target image, C ₂ The number of categories representing the target image, i representing the category number, G _c2 (F ₂ ) Representing the second classification score map, [ G ] _c2 (F ₂ )] ^i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; />Representing the class center of the kth channel jth pixel of the target image.

5. A method of domain-adaptive semantic segmentation based on cross-domain class awareness according to claim 3, further comprising:

the first attention profile and the second attention profile are trimmed with a 1 x 1 convolution layer.

6. The method of claim 1, further comprising, prior to said processing the source adapted image sequentially through a first feature extraction network and a first classifier: channel compressing the source adaptation image;

before the target image is sequentially processed through the second feature extraction network and the second classifier, the method further includes: and carrying out channel compression on the target image.

7. The utility model provides a field adaptation semantic segmentation device based on cross field class perception which characterized in that includes:

the preprocessing module is used for: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image;

the feature classification module is used for: processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image;

The feature map adjusting module is used for: inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained;

the semantic segmentation module is used for: performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map;

the feature map adjustment module is specifically configured to, when adjusting the category centers of the first feature map and the second feature map by the cross domain category center generators of the two cross domain category sensors, respectively: performing inner product operation on the first classification score map and the second feature map to obtain a class center after the first feature map is adjusted; and performing inner product operation on the second classification score graph and the first feature graph to obtain a classification center after the second feature graph is adjusted.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the domain-adaptive semantic segmentation method based on cross domain category awareness according to any of claims 1 to 6 when executing the computer program.

9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the domain-adaptive semantic segmentation method based on cross domain category awareness according to any of claims 1 to 6.