[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112016592B - Domain adaptive semantic segmentation method and device based on cross domain category perception - Google Patents

Domain adaptive semantic segmentation method and device based on cross domain category perception Download PDF

Info

Publication number
CN112016592B
CN112016592B CN202010773728.3A CN202010773728A CN112016592B CN 112016592 B CN112016592 B CN 112016592B CN 202010773728 A CN202010773728 A CN 202010773728A CN 112016592 B CN112016592 B CN 112016592B
Authority
CN
China
Prior art keywords
feature
feature map
attention
map
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010773728.3A
Other languages
Chinese (zh)
Other versions
CN112016592A (en
Inventor
李仕仁
王金桥
朱贵波
胡建国
张海
赵朝阳
林格
谭大伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nexwise Intelligence China Ltd
Original Assignee
Nexwise Intelligence China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nexwise Intelligence China Ltd filed Critical Nexwise Intelligence China Ltd
Priority to CN202010773728.3A priority Critical patent/CN112016592B/en
Publication of CN112016592A publication Critical patent/CN112016592A/en
Application granted granted Critical
Publication of CN112016592B publication Critical patent/CN112016592B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a field-adaptive semantic segmentation method and device based on cross field category perception, wherein the method comprises the following steps: after converting the source image style into the target image style, respectively extracting and classifying the characteristics; inputting the feature map and the classification score map into a cross domain class perception module; respectively adjusting the category centers of the feature graphs through the cross domain category center generators of the two cross domain category sensors to enable the category centers of the feature graphs to be close; and adjusting the classified fuzzy feature points of the feature images through the classified attention module respectively to obtain a first attention feature image and a second attention feature image so as to perform semantic segmentation. The embodiment model focuses on the class center of the data feature of the other field when extracting the feature of the certain field, and combines the attention mechanism to adjust the pixel point feature with fuzzy classification in the two fields, so that the class centers of the same class of features in different fields are consistent, the difference of feature distribution is reduced, and the field adaptation is realized.

Description

Domain adaptive semantic segmentation method and device based on cross domain category perception
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a field-adaptive semantic segmentation method and device based on cross field category perception.
Background
Labeling of semantically partitioned data tags requires a significant amount of manual effort. Thus, the real data set for semantic segmentation typically contains only a small number of samples, but this suppresses generalization of the model to various real cases. The common solution is an unsupervised semantic segmentation method, i.e. a model trained based on computer synthesized data sets is used for the data sets of the same kind of real scenes. In order to reduce damage to the actual feature information, a domain adaptation method is required to reduce differences in the feature spatial distribution of the images of the data sets in different domains. Traditional domain adaptation methods typically consider what way to migrate knowledge of the computer synthesis domain to the real scene, thus achieving domain adaptation, without focusing on what knowledge of the computer is migrated, in short, only "how to adapt" and not "what to implement adaptation".
There is some similarity in the image content of different fields, for example, the categories within the picture are approximately the same. Thus, feature spaces of the same class of different domain datasets extracted with the same model should be similar, as should class centers. However, there is often a difference in the feature distribution of the same class of data sets of real scenes and computer synthesized scenes. Therefore, how to realize the domain adaptation by reducing the difference of the feature distribution of different domains is a problem to be solved.
Disclosure of Invention
In order to solve the problems in the prior art, the embodiment of the invention provides a field-adaptive semantic segmentation method and device based on cross field category perception.
In a first aspect, an embodiment of the present invention provides a domain adaptive semantic segmentation method based on cross domain category awareness, where the method includes: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.
Further, the adjusting, by the cross domain class center generator of the two cross domain class perceptrons, class centers of the first feature map and the second feature map, specifically includes: performing inner product operation on the first classification score map and the second feature map to obtain a class center after the first feature map is adjusted; and performing inner product operation on the second classification score graph and the first feature graph to obtain a classification center after the second feature graph is adjusted.
Further, the class center after the adjustment of the first feature map is expressed as:
the class center after the second feature map is adjusted is expressed as:
wherein,the class center representing the ith class of the source data, H representing the feature height, W representing the feature width, j representing the number of pixels, G c1 (F 1 ) Representing the first classification score graph, [ G ] c1 (F 1 )] i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] 2 ] j Representing a feature distribution of a j-th pixel in the second feature map; />The class center, G, representing the ith class of the target data c2 (F 2 ) Representing the second classification score map, [ G ] c2 (F 2 )] i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] 1 ] j Representing the feature distribution of the jth pixel in the first feature map.
Further, the step of performing distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to obtain a first attention feature map and a second attention feature map respectively, which specifically includes: taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center after the source data adjustment to obtain a first class attention feature; carrying out channel addition on the first category attention feature and the first feature map to obtain the first attention feature map; taking the second classification score graph as an attention map, and performing inner product operation on the second classification score graph and the class center after the target data adjustment to obtain a second class attention characteristic; and carrying out channel addition on the second category attention feature and the second feature map to obtain the second attention feature map.
Further, the first attention profile is expressed as:
Wherein,the first attention profile representing the jth pixel of the kth channel of the source image, C 1 Representing the number of categories of the source image, i representing the category number, G c1 (F 1 ) Representing the first classification score graph, [ G ] c1 (F 1 )] i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; />Representing the class center of a kth pixel of a kth channel of the source image;
the second attention profile is expressed as:
wherein,the second attention profile representing the jth pixel of the kth channel of the target image, C 2 The number of categories representing the target image, i representing the category number, G c2 (F 2 ) Representing the second classification score map, [ G ] c2 (F 2 )] i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; />Representing the class center of the kth channel jth pixel of the target image.
Further, the method further comprises: the first attention profile and the second attention profile are trimmed with a 1 x 1 convolution layer.
Further, before the processing the source adapted image sequentially through the first feature extraction network and the first classifier, the method further comprises: channel compressing the source adaptation image; before the target image is sequentially processed through the second feature extraction network and the second classifier, the method further includes: and carrying out channel compression on the target image.
In a second aspect, an embodiment of the present invention provides a domain-adaptive semantic segmentation apparatus based on cross domain category awareness, where the apparatus includes: the preprocessing module is used for: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; the feature classification module is used for: processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; the feature map adjusting module is used for: inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; the semantic segmentation module is used for: and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.
In a third aspect, an embodiment of the invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as provided in the first aspect when executing the computer program.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as provided by the first aspect.
According to the domain adaptation semantic segmentation method and device based on cross domain class perception, the cross domain class perception module comprising the cross domain class center generator and the class attention module is arranged, so that when a certain domain feature is extracted by a model, the class center of the data feature of the other domain is focused, the attention mechanism is combined, the pixel point feature with fuzzy classification in the two domains is adjusted, the class centers of the same class features in different domains are consistent, the difference of feature distribution is reduced, and domain adaptation is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a domain-adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a domain-adaptive semantic segmentation method based on cross domain class awareness according to an embodiment of the present invention;
FIG. 3 is a schematic process flow diagram of a domain-adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a processing procedure of a cross domain class awareness module in a domain adaptive semantic segmentation method based on cross domain class awareness according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a domain-adaptive semantic segmentation device based on cross domain class awareness according to an embodiment of the present invention;
fig. 6 illustrates a physical structure diagram of an electronic device.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flowchart of a domain adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step 101, converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image and the label data of the source image are consistent.
Semantic segmentation is a typical computer vision problem that involves taking some raw data (e.g., planar images) as input and converting them into a mask with highlighted regions of interest. Many people use the term full pixel semantic segmentation (full-pixel semantic segmentation), where each pixel in an image is assigned a class ID according to the object of interest to which it belongs.
The domain adaptation semantic segmentation method based on the cross domain category perception (also can be called as a domain adaptation semantic segmentation method based on the cross domain category perception countermeasure learning) provided by the embodiment of the invention is mainly used for domain adaptation of a model between a computer synthesized data set and a real scene data set, and aims to solve the segmentation problem of a target data set under the condition of no tag data.
In the embodiment of the invention, the source data set is a data set formed by images with labels, wherein the images are called source images; the target data set is a data set composed of images without labels, wherein the images are called target images. According to the embodiment of the invention, on one hand, the data set with the label can be used for assisting in realizing the accurate semantic segmentation of the data set without the label, and the knowledge learned from the target data set can be migrated into the model training of the source data set, so that the category centers of the source data set and the target data set are mutually close to each other, and more accurate semantic segmentation is realized on the images in the source data set and the target data set.
Because the processing procedures of the source image in the source data set and the target image in the target data set are similar in the embodiment of the invention, the source image and the target image need to be unified in style in order to achieve the approach of the characteristic image category centers of the source image and the target image. In addition, as one of the purposes is to train the semantic segmentation model of the target image by utilizing the source image, the style of the source image in the source data set is converted into the style of the target image in the target data set through the style migration network, so that the source adaptation image is obtained, and the style conversion can be realized by utilizing the style migration network; wherein the source adaptation image and the label data of the source image are consistent.
102, processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; and processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image.
After the style is unified, carrying out feature extraction on the source adaptation image through a first feature extraction network to obtain a first feature map; and then, inputting the first feature map into a first classifier for processing to obtain a first classification score map. The first classification score map includes a classification score for each pixel of the source image. And extracting the characteristics of the target image through a second characteristic extraction network to obtain a second characteristic image; and then, inputting the second characteristic diagram into a second classifier for processing to obtain a second classification score diagram. The second class score map includes class scores for respective pixels of the target image.
Step 103, inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; and the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that the first attention feature map and the second attention feature map are respectively obtained.
The embodiment of the invention provides a method for training a model by adopting a labeled source data set and a target data set without labels. Because the source data set has the tag data, a model which can be used for classifying the characteristics of the source data can be trained, and because the category characteristics of the same kind of target data have certain differences with the category characteristics of the source data, the model trained by the source data has poor characteristic classification capability on the target data. Therefore, the embodiment of the invention provides a module for cross domain category perception (cross domain category perception module), which can enable the category center of the target data of the same category to be approximately the same as the category center of the source data, so that a model learned by the supervised source data is applicable to the classification of the target data. The feature distribution of the same kind in different fields can have some differences, so the embodiment of the invention makes the model cross-sense the feature distribution of the opposite field when extracting the features, thereby the category centers of the different fields are close to each other, and finally the feature distribution of the same category is consistent.
Specifically, inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the step of adjusting the category centers of the first feature map and the second feature map through the cross domain category center generators of the two cross domain category perceptrons respectively means that the category center of the first feature map is adjusted through one of the cross domain category center generators of the two cross domain category perceptrons, and the category center of the second feature map is adjusted through the other of the cross domain category center generators of the two cross domain category perceptrons. In the class score map, the class scores of some pixels are relatively close, i.e., the uncertainty of the classifier on which class the pixels should be classified into is relatively high, and thus, the misclassification is more likely to be caused. These points, which are more likely to be misclassified, need to be given more attention and should be emphasized in the subsequent processing. And respectively carrying out distribution adjustment on the classification fuzzy characteristic points of the first characteristic map and the second characteristic map through the category attention module according to the adjusted category center by using an attention mechanism to respectively obtain a first attention characteristic map and a second attention characteristic map. The step of performing distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention modules respectively refers to performing distribution adjustment on the classification fuzzy feature points of the first feature map through one of the classification attention modules in the two cross domain type perceptrons and performing distribution adjustment on the classification fuzzy feature points of the second feature map through the other classification attention module in the two cross domain type perceptrons.
And 104, carrying out semantic segmentation on the source image according to the first attention feature map and carrying out semantic segmentation on the target image according to the second attention feature map.
And respectively carrying out semantic segmentation on the source image and the target image according to the first attention characteristic diagram and the second attention characteristic diagram. The first attention feature map and the second attention feature map are still feature maps in nature, and the prior art method for performing semantic segmentation according to the feature maps can be used for performing semantic segmentation according to the first attention feature map to obtain a source image segmentation result, and performing semantic segmentation according to the second attention feature map to obtain a target image segmentation result.
According to the embodiment of the invention, the cross domain class perception module comprising the cross domain class center generator and the class attention module is arranged, so that when a model extracts a certain domain feature, the class center of the data feature of the other domain is focused, and the attention mechanism is combined, so that the classification fuzzy pixel point features in the two domains are adjusted, the class centers of the same kind of features in different domains are consistent, the difference of feature distribution is reduced, and the domain adaptation is realized.
Fig. 2 is a schematic diagram of a domain adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention. As shown in fig. 2, the style of the source dataset image is first converted to the style of the target dataset image through a style migration network. The label data of the image after style migration is consistent with the label data of the source image, which is called a source adapted image. The resulting source adaptation image a will then be described s→t Input to a feature extraction network G f1 (first feature extraction network) performing feature extraction to obtain a first feature map F s→t (use F in FIGS. 3 and 4) 1 Representation) and then through classifier G c1 (first)A classifier) to obtain a first classification score graph G c1 (F s→t ) (use G in FIGS. 3 and 4) c1 (F 1 ) A representation); image a of the object t Input to a feature extraction network G f2 (second feature extraction network) performing feature extraction to obtain a second feature map F t (use F in FIGS. 3 and 4) 2 Representation) and then through classifier G c2 (second classifier) obtaining a second classification score graph G c2 (F t ) (use G in FIGS. 3 and 4) c2 (F 2 ) Representation). And respectively inputting the obtained characteristic diagram and the classification score diagram into a constructed cross domain class perception module CDCAM. The first feature map, the first classification score map, the second feature map and the second classification score map are input to a cross domain class perception module to respectively obtain a first attention feature map Z s→t (Z is used in FIGS. 3 and 4) 1 Representation) and a second attention profile Z t (Z is used in FIG. 3) 2 Representation).
The main task of the cross domain class perception module is to perceive each other according to the class score diagrams of the source data and the target data and the characteristics extracted by the opposite party, so as to promote the class centers of the two domains to adapt to each other. Specifically, the module can enable the characteristics of the two fields extracted by the model to sense the category center of the data characteristics of the other party, so that the category center of the characteristics of the two data fields is close to the other party. It can be seen that the embodiment of the invention also transfers the knowledge learned from the target data set to the model training of the source data set, so that the model can pay attention to the category distribution of the target data when extracting the source data characteristics, thereby improving the robustness of the model. And finally, inputting the image features of the source data set and the image features of the target data set processed by the cross field category perception module into a discriminator D for discrimination. The discriminator is used for discriminating the classification rationality of the image features of the source data set and the image features of the target data set processed by the cross domain class perception module. The feature extraction network and the cross field category perception module act as generators, and the spatial distribution of the generated feature images needs to be consistent, so that the difference between the feature extraction network and the cross field category perception module cannot be identified by the discriminator. Of course, the arbiter does not have to be a module.
Fig. 3 is a schematic process flow diagram of a domain-adaptive semantic segmentation method based on cross domain category awareness according to an embodiment of the present invention. Fig. 4 is a schematic diagram of a processing procedure of a cross domain class awareness module in a domain adaptive semantic segmentation method based on cross domain class awareness according to an embodiment of the present invention. As shown in fig. 3, the features on the upper and lower sides respectively represent the feature F output by the feature extraction network and the class score graph G output by the classifier in two data fields c (F) A. The invention relates to a method for producing a fibre-reinforced plastic composite To reduce the computational effort, the features may be first channel compressed, and a 1 x 1 convolutional layer may be used for channel compression. The compressed features and score maps of the same domain and features of the perceived domain are then input to a cross domain class perceptron (CrossDomain ClassAware Block, CDCAB for short). It can be seen from fig. 3 that the output of two fields of the CDCAM module requires attention to the characteristic information of the data of the other field, which is the origin of the cross-field class awareness module name. While CDCAB consists mainly of two parts, a cross-domain class center generator (CrossDomain Class Center Block) and a class attention module (ClassAttention Block), both modules can be represented by GCDCCB () and GCAB () functions, respectively, as shown in fig. 3. The output of the CDCAB module in the last two fields can be expressed by the following equations, respectively.
In FIG. 3, N, H, W each represents the number of channels, the feature height and the feature width of the feature map, A 1 Representation of F 1 Feature map after channel compression, A 2 Representation of F 2 And carrying out a characteristic diagram after channel compression. C represents the number of categories. In fig. 4, N' represents the number of channels after compression.
Further, based on the above embodiment, the adjusting, by the cross domain class center generator of the two cross domain class perceptrons, class centers of the first feature map and the second feature map, specifically includes: performing inner product operation on the first classification score map and the second feature map to obtain a class center after the first feature map is adjusted; and performing inner product operation on the second classification score graph and the first feature graph to obtain a classification center after the second feature graph is adjusted.
As shown in fig. 3, an inner product operation is performed on the first classification score map and the second feature map, so as to obtain a classification center after the first feature map is adjusted. Similarly, an inner product operation is performed on the second classification score map and the first feature map, so as to obtain a classification center (not shown in fig. 3) after the second feature map is adjusted.
Of the multiple categories within the semantic segmentation final predictor graph, the category center of the ith category may be represented by the following formula:
wherein F is j ∈R C×H×W Characteristic diagram representing jth pixel point, y j ∈R 1×HW Is the true prediction result, [ y ] j =i]The method indicates that whether the true prediction result of the j-th pixel is of the i-th class is judged, if yes, the value is 1, and if not, the value is 0. Therefore, the cross domain category perception module will adjust the category center of the current domain feature according to this formula in combination with the feature information of the perceived domain so that the category center can be close to the category center of the perceived domain.
Since the tag information is not directly provided in the target data set, but the feature information of another domain is intended to adjust the class center of the current domain feature, the initial classification score map of the perceived domain is used as rough tag information, F j And taking the compressed feature image as a feature image of the pixel points in the feature image of the perceived field. Moreover, the source data set and the target data set are actually perceived by each other, and therefore, have the following expression:
the class center after the first feature map is adjusted is expressed as:
the class center after the second feature map is adjusted is expressed as:
wherein,the class center representing the ith class of the source data, H representing the feature height, W representing the feature width, j representing the number of pixels, G c1 (F 1 ) Representing the first classification score graph, [ G ] c1 (F 1 )] i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] 2 ] j Representing a feature distribution of a j-th pixel in the second feature map; />The class center, G, representing the ith class of the target data c2 (F 2 ) Representing the second classification score map, [ G ] c2 (F 2 )] i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] 1 ] j Representing the feature distribution of the jth pixel in the first feature map. G c1 (F 1 )∈R c1×HW ,G c2 (F 2 )∈R c2×HW ,A 1 、A 2 ∈R HW×N′ Every category center->
The construction of the cross-domain class center generator has two advantages: first, a feature map of another domain data is added to this module, so that each feature center can understand global information of another domain data. And secondly, the class center obtained by the module can coordinate the consistency between each pixel point and class information, so that the class center in the current field can be finely adjusted through the operation, and the obtained class center is more compatible with the class center in the other field. After the cross domain sensing, some classification and fuzzy characteristic points are separated more, so that the recognition is better; meanwhile, centers of the same category in different fields are closer, so that the segmentation of images in different fields can be finished by the same model.
Further, based on the foregoing embodiment, the performing, by the category attention module, distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map to obtain a first attention feature map and a second attention feature map, respectively, includes: taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center after the source data adjustment to obtain a first class attention feature; carrying out channel addition on the first category attention feature and the first feature map to obtain the first attention feature map; taking the second classification score graph as an attention map, and performing inner product operation on the second classification score graph and the class center after the target data adjustment to obtain a second class attention characteristic; and carrying out channel addition on the second category attention feature and the second feature map to obtain the second attention feature map.
In different fields, the feature distribution of some categories is similar, so that all feature points in different fields do not need to be adjusted. But rather focus on those feature points to which the category is more ambiguous so that they can be explicitly categorized. Inspired by the attention mechanism, a category attention module (Class Attention Block) is constructed in an embodiment of the invention. For the class score map obtained by the current field, the class scores of some pixels are relatively close, i.e. the uncertainty of the classifier on which class the pixels should be classified into is relatively high, so that the misclassification is more likely to be caused. These points, which are more likely to be misclassified, need to be given more attention and should be emphasized in the subsequent processing. Therefore, by using the thought of the attention mechanism, the category attention feature is obtained by taking the category score map of the current field as an attention map according to the adjusted category center, and then the category attention feature is added with the input through a channel.
As shown in fig. 3, the performing, by the category attention module, distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map to obtain a first attention feature map and a second attention feature map, respectively, specifically includes: taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center after the source data adjustment to obtain a first class attention feature; carrying out channel addition on the first category attention feature and the first feature map to obtain the first attention feature map; taking the second classification score graph as an attention map, and performing inner product operation on the second classification score graph and the class center after the target data adjustment to obtain a second class attention characteristic; and carrying out channel addition on the second category attention feature and the second feature map to obtain the second attention feature map.
The cross domain category center F is already available in the description of the above embodiments class ∈R c×N C represents the number of categories, and the category score graph G F ∈R C×H×W The class score graph of the current domain is deformed to change its dimension into c×hw. Finally, a category attention feature map Z is obtained a ∈R N′×HW Then deforming it to obtain Z a ∈R N′×H×W . Wherein:
the first attention profile is expressed as:
wherein,the first attention profile representing the jth pixel of the kth channel of the source image, C 1 Representing the number of categories of the source image, i representing the category number, G c1 (F 1 ) Representing the first classification score graph, [ G ] c1 (F 1 )] i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; />Representing the class center of a kth pixel of a kth channel of the source image;
the second attention profile is expressed as:
wherein,the second attention profile representing the jth pixel of the kth channel of the target image, C 2 The number of categories representing the target image, i representing the category number, G c2 (F 2 ) Representing the second classification score map, [ G ] c2 (F 2 )] i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; />Representing the class center of the kth channel jth pixel of the target image.
After the first attention characteristic diagram and the second attention characteristic diagram are obtained, a convolution layer with the size of 1 multiplied by 1 can be adopted to finely adjust the output attention characteristic diagram, so that the result is more accurate.
Therefore, the domain adaptive semantic segmentation method based on the cross domain class perception designed by the embodiment of the invention can adjust the class center of the current domain according to the characteristic content of the perceived domain, so that the trained model can adjust the characteristic information of the pixel points with fuzzy class comparison in the classified score graph. Finally, the class center of the perceived feature is close to the class center of the perceived field, and the task of field adaptation is completed more excellently.
Fig. 5 is a schematic structural diagram of a domain-adaptive semantic segmentation device based on cross domain category awareness according to an embodiment of the present invention. As shown in fig. 5, the apparatus includes a preprocessing module 10, a feature classification module 20, a feature map adjustment module 30, and a semantic segmentation module 40, where:
the preprocessing module 10 is used for: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; the feature classification module 20 is configured to: processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; the feature map adjustment module 30 is configured to: inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; the semantic segmentation module 40 is configured to: and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.
According to the embodiment of the invention, the cross domain class perception module comprising the cross domain class center generator and the class attention module is arranged, so that when a model extracts a certain domain feature, the class center of the data feature of the other domain is focused, and the attention mechanism is combined, so that the classification fuzzy pixel point features in the two domains are adjusted, the class centers of the same kind of features in different domains are consistent, the difference of feature distribution is reduced, and the domain adaptation is realized.
The device provided in the embodiment of the present invention is used in the above method, and specific functions may refer to the above method flow, which is not described herein again.
Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a domain-adaptive semantic segmentation method based on cross-domain class awareness, the method comprising: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, embodiments of the present invention further provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the domain-adaptive semantic segmentation method based on cross domain category awareness provided by the above method embodiments, the method comprising: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.
In yet another aspect, embodiments of the present invention further provide a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the domain-adaptive semantic segmentation method based on cross domain category awareness provided by the above embodiments, the method comprising: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image; processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image; inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained; and carrying out semantic segmentation on the source image according to the first attention characteristic diagram and carrying out semantic segmentation on the target image according to the second attention characteristic diagram.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. The domain adaptive semantic segmentation method based on cross domain category perception is characterized by comprising the following steps of:
converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image;
processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image;
Inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained;
performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map; the adjusting, by the cross domain class center generators of the two cross domain class perceptrons, class centers of the first feature map and the second feature map, includes:
Performing inner product operation on the first classification score map and the second feature map to obtain a class center after the first feature map is adjusted;
and performing inner product operation on the second classification score graph and the first feature graph to obtain a classification center after the second feature graph is adjusted.
2. The domain-adaptive semantic segmentation method based on cross domain class awareness according to claim 1, wherein the class center after the first feature map adjustment is expressed as:
the class center after the second feature map is adjusted is expressed as:
wherein,the class center representing the ith class of the source data, H representing the feature height, W representing the feature width, j representing the number of pixels, G c1 (F 1 ) Representing the first classification scoreFigure of number, [ G ] c1 (F 1 )] i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] 2 ] j Representing a feature distribution of a j-th pixel in the second feature map; />The class center, G, representing the ith class of the target data c2 (F 2 ) Representing the second classification score map, [ G ] c2 (F 2 )] i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; [ A ] 1 ] j Representing the feature distribution of the jth pixel in the first feature map.
3. The method for domain-adaptive semantic segmentation based on cross domain class awareness according to claim 1, wherein the performing, by the class attention module, distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map to obtain a first attention feature map and a second attention feature map, respectively, specifically includes:
taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center after the source data adjustment to obtain a first class attention feature; carrying out channel addition on the first category attention feature and the first feature map to obtain the first attention feature map;
taking the second classification score graph as an attention map, and performing inner product operation on the second classification score graph and the class center after the target data adjustment to obtain a second class attention characteristic; and carrying out channel addition on the second category attention feature and the second feature map to obtain the second attention feature map.
4. A method of domain-adaptive semantic segmentation based on cross-domain category awareness according to claim 3, wherein the first attention profile is represented as:
Wherein,the first attention profile representing the jth pixel of the kth channel of the source image, C 1 Representing the number of categories of the source image, i representing the category number, G c1 (F 1 ) Representing the first classification score graph, [ G ] c1 (F 1 )] i,j Indicating whether the jth pixel in the first classification score map belongs to the ith class, wherein the value is 1, and the value is 0; />Representing the class center of a kth pixel of a kth channel of the source image;
the second attention profile is expressed as:
wherein,the second attention profile representing the jth pixel of the kth channel of the target image, C 2 The number of categories representing the target image, i representing the category number, G c2 (F 2 ) Representing the second classification score map, [ G ] c2 (F 2 )] i,j Indicating whether the jth pixel in the second classification score map belongs to the ith class, wherein the value is 1, and the value is 0; />Representing the class center of the kth channel jth pixel of the target image.
5. A method of domain-adaptive semantic segmentation based on cross-domain class awareness according to claim 3, further comprising:
the first attention profile and the second attention profile are trimmed with a 1 x 1 convolution layer.
6. The method of claim 1, further comprising, prior to said processing the source adapted image sequentially through a first feature extraction network and a first classifier: channel compressing the source adaptation image;
before the target image is sequentially processed through the second feature extraction network and the second classifier, the method further includes: and carrying out channel compression on the target image.
7. The utility model provides a field adaptation semantic segmentation device based on cross field class perception which characterized in that includes:
the preprocessing module is used for: converting the style of the source image in the source data set into the style of the target image in the target data set through a style migration network to obtain a source adaptation image; wherein the source adaptation image is consistent with tag data of the source image;
the feature classification module is used for: processing the source adaptation image sequentially through a first feature extraction network and a first classifier to obtain a first feature image and a first classification score image; processing the target image sequentially through a second feature extraction network and a second classifier to obtain a second feature image and a second classification score image;
The feature map adjusting module is used for: inputting the first feature map, the first classification score map, the second feature map and the second classification score map to a cross domain class perception module; the cross domain type perception module comprises two cross domain type perceptrons, wherein each cross domain type perceptrons comprises a cross domain type center generator and a type attention module which are sequentially connected, and the type centers of the first feature map and the second feature map are adjusted through the cross domain type center generators of the two cross domain type perceptrons respectively so that the type centers of the first feature map and the second feature map are close; the classification fuzzy feature points of the first feature map and the second feature map are respectively distributed and adjusted through the classification attention module, so that a first attention feature map and a second attention feature map are respectively obtained;
the semantic segmentation module is used for: performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map;
the feature map adjustment module is specifically configured to, when adjusting the category centers of the first feature map and the second feature map by the cross domain category center generators of the two cross domain category sensors, respectively: performing inner product operation on the first classification score map and the second feature map to obtain a class center after the first feature map is adjusted; and performing inner product operation on the second classification score graph and the first feature graph to obtain a classification center after the second feature graph is adjusted.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the domain-adaptive semantic segmentation method based on cross domain category awareness according to any of claims 1 to 6 when executing the computer program.
9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the domain-adaptive semantic segmentation method based on cross domain category awareness according to any of claims 1 to 6.
CN202010773728.3A 2020-08-04 2020-08-04 Domain adaptive semantic segmentation method and device based on cross domain category perception Active CN112016592B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010773728.3A CN112016592B (en) 2020-08-04 2020-08-04 Domain adaptive semantic segmentation method and device based on cross domain category perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010773728.3A CN112016592B (en) 2020-08-04 2020-08-04 Domain adaptive semantic segmentation method and device based on cross domain category perception

Publications (2)

Publication Number Publication Date
CN112016592A CN112016592A (en) 2020-12-01
CN112016592B true CN112016592B (en) 2024-01-26

Family

ID=73499087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010773728.3A Active CN112016592B (en) 2020-08-04 2020-08-04 Domain adaptive semantic segmentation method and device based on cross domain category perception

Country Status (1)

Country Link
CN (1) CN112016592B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205096B (en) 2021-04-26 2022-04-15 武汉大学 Attention-based combined image and feature self-adaptive semantic segmentation method
CN112990378B (en) * 2021-05-08 2021-08-13 腾讯科技(深圳)有限公司 Scene recognition method and device based on artificial intelligence and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011072259A1 (en) * 2009-12-10 2011-06-16 Indiana University Research & Technology Corporation System and method for segmentation of three dimensional image data
CN108960260A (en) * 2018-07-12 2018-12-07 东软集团股份有限公司 A kind of method of generating classification model, medical image image classification method and device
CN110399856A (en) * 2019-07-31 2019-11-01 上海商汤临港智能科技有限公司 Feature extraction network training method, image processing method, device and its equipment
CN110991516A (en) * 2019-11-28 2020-04-10 哈尔滨工程大学 Side-scan sonar image target classification method based on style migration
CN111340039A (en) * 2020-02-12 2020-06-26 杰创智能科技股份有限公司 Target detection method based on feature selection

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160286156A1 (en) * 2015-02-12 2016-09-29 Creative Law Enforcement Resources, Inc. System for managing information related to recordings from video/audio recording devices

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011072259A1 (en) * 2009-12-10 2011-06-16 Indiana University Research & Technology Corporation System and method for segmentation of three dimensional image data
CN108960260A (en) * 2018-07-12 2018-12-07 东软集团股份有限公司 A kind of method of generating classification model, medical image image classification method and device
CN110399856A (en) * 2019-07-31 2019-11-01 上海商汤临港智能科技有限公司 Feature extraction network training method, image processing method, device and its equipment
CN110991516A (en) * 2019-11-28 2020-04-10 哈尔滨工程大学 Side-scan sonar image target classification method based on style migration
CN111340039A (en) * 2020-02-12 2020-06-26 杰创智能科技股份有限公司 Target detection method based on feature selection

Also Published As

Publication number Publication date
CN112016592A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
US20220092351A1 (en) Image classification method, neural network training method, and apparatus
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
US12039440B2 (en) Image classification method and apparatus, and image classification model training method and apparatus
CN111160264B (en) Cartoon character identity recognition method based on generation countermeasure network
CN113762138B (en) Identification method, device, computer equipment and storage medium for fake face pictures
CN111582044A (en) Face recognition method based on convolutional neural network and attention model
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
CN112560831A (en) Pedestrian attribute identification method based on multi-scale space correction
CN112836625A (en) Face living body detection method and device and electronic equipment
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN110211127A (en) Image partition method based on bicoherence network
CN112016592B (en) Domain adaptive semantic segmentation method and device based on cross domain category perception
Xiao et al. Pedestrian object detection with fusion of visual attention mechanism and semantic computation
CN114332893A (en) Table structure identification method and device, computer equipment and storage medium
CN117275074A (en) Facial expression recognition method based on broad attention and multi-scale fusion mechanism
Zia et al. An adaptive training based on classification system for patterns in facial expressions using SURF descriptor templates
Vijayalakshmi K et al. Copy-paste forgery detection using deep learning with error level analysis
CN113850182B (en) DAMR _ DNet-based action recognition method
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
CN113435315A (en) Expression recognition method based on double-path neural network feature aggregation
Cai et al. Vehicle detection based on visual saliency and deep sparse convolution hierarchical model
Özyurt et al. A new method for classification of images using convolutional neural network based on Dwt-Svd perceptual hash function
CN114332884B (en) Document element identification method, device, equipment and storage medium
CN116129417A (en) Digital instrument reading detection method based on low-quality image
CN114972965A (en) Scene recognition method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant