CN111274999A

CN111274999A - Data processing method, image processing method, device and electronic equipment

Info

Publication number: CN111274999A
Application number: CN202010097917.3A
Authority: CN
Inventors: 夏春龙
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2020-06-12
Anticipated expiration: 2040-02-17
Also published as: CN111274999B

Abstract

The invention provides a data processing method, an image processing method, a data processing device, an image processing device and electronic equipment, which relate to the technical field of image processing and comprise the steps of obtaining a target characteristic diagram of an image to be processed; performing dimensionality reduction and relevance redistribution processing on the target feature map through a spatial position feature attention module to obtain a spatial attention result map, wherein the spatial attention result map is a feature map obtained after weighting calculation is performed on each spatial position feature of the target feature map; the method comprises the steps of processing a space attention result graph and the target feature graph through a channel attention module to obtain a channel attention distribution graph, wherein the channel attention distribution graph represents the importance degree of each feature channel in the target feature graph, and the technical problems that a traditional convolutional network model cannot effectively obtain global information and an existing attention model is large in calculation amount are solved.

Description

Data processing method, image processing method, device and electronic equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a data processing method, an image processing device, and an electronic apparatus.

Background

Face recognition is a basic task in the field of computer vision, and refers to a technology capable of recognizing or verifying the identity of a subject in an image or video. Generally, the object of face recognition is a frame of image, which is directly processed by the most original neural network, consuming time and video memory. Therefore, the convolutional neural network works accordingly.

The main frame models of the convolutional neural network applicable to the face recognition task include resnet, resnext, mobilenet, shufflent, vgg, googlenet and the like. These models all have a feature of expanding the receptive field of the model by accumulating layer by layer, which inherently saves computational power and memory resources, but local receptive field operations have a problem in each layer of computation: global information dropout and the characteristics of the different channels are not distinguished. The local convolutional neural network has the disadvantages that the size of the region cannot be determined, frequent cutting and splicing operations are required, and the efficiency is not good. The Senet network does not count or correlate characteristic information of spatially distinct locations. The Non-local method has a large calculation amount. GCNet combines the senet and non-local methods, simplifies the non-local module by observing the phenomenon that attention images at different positions are almost similar, but is not simple enough to calculate the characteristic relation of different positions in space.

In summary, the existing face recognition algorithm has the following disadvantages: the use of local receptive fields limits the ability of the model to obtain global information; the local convolution operation cannot determine the convolution range and the frequent cutting and splicing operation is not friendly to the efficiency; the existing attention model is rough in design or large in calculation amount.

Disclosure of Invention

In view of the above, the present invention provides a data processing method, an image processing device, and an electronic apparatus, so as to solve the technical problems that the existing attention model cannot obtain global information and the existing attention model has a large calculation amount.

In a first aspect, an embodiment of the present invention provides a data processing method, including: acquiring a target characteristic diagram of an image to be processed; performing dimensionality reduction and relevance redistribution processing on the target feature map through a spatial position feature attention module to obtain a spatial attention result map, wherein the spatial attention result map is a feature map obtained after weighting calculation is performed on each spatial position feature of the target feature map; and processing the space attention result graph and the target feature graph through a channel attention module to obtain a channel attention distribution graph, wherein the channel attention distribution graph represents the importance degree of each feature channel in the target feature graph.

Further, the step of performing de-clustering and relevance re-distribution processing on the target feature map to obtain a spatial attention result map comprises the following steps: performing dimensionality reduction on the target feature map to obtain a first dimensionality reduction feature map, wherein the size of the target feature map is C H W, the size of the first dimensionality reduction feature map is C S, and C is the channel number of the target feature map; performing dimensionality reduction processing on the first dimensionality reduction feature map on the dimensionality of the number of the channels to obtain a second dimensionality reduction feature map; operating a second dimension reduction characteristic diagram to obtain a weight value matrix redistributed by the association degree of the first dimension reduction characteristic diagram, wherein the weight value matrix comprises the weight value of each space position characteristic in each characteristic channel of the target characteristic diagram; and fusing the weight value and the first dimension reduction feature map to obtain the space attention result map.

Further, the spatial location feature attention module includes: the device comprises a characteristic pyramid pooling module, a convolution module and a softmax processing module.

Further, performing dimension reduction processing on the target feature map to obtain a first dimension reduction feature map includes: and performing dimension reduction processing on the target feature map through the feature pyramid pooling module to obtain a first dimension reduction feature map.

Further, in the dimension of the number of channels, performing dimension reduction processing on the first dimension reduction feature map to obtain a second dimension reduction feature map includes: and performing dimensionality reduction processing on the first dimensionality reduction feature map through the convolution module in the dimensionality of the channel number to obtain a second dimensionality reduction feature map, wherein the channel number of the second dimensionality reduction feature map is smaller than the channel number of the first dimensionality reduction feature map.

Further, operating the second dimension reduction feature map to obtain a weight value matrix redistributed by the association degree of the first dimension reduction feature map comprises: and operating the second dimension reduction characteristic diagram through the softmax processing module to obtain a weight value matrix redistributed by the association degree of the first dimension reduction characteristic diagram.

Further, the fusing the weight value and the first dimension reduction feature map to obtain the spatial attention result map includes: and calculating a product between the weight value matrix and the first dimension reduction feature map, and determining a product calculation result as the space attention result map.

Further, the channel attention module includes: a full connection module and a sigmoid module.

Further, the processing the spatial attention result graph and the target feature graph by the channel attention module to obtain a channel attention distribution graph includes: performing full-connection calculation on the space attention result graph through the full-connection module to obtain a full-connection calculation result; carrying out normalization calculation on the full-connection calculation result through the sigmoid module to obtain a normalization calculation result; and fusing the normalization calculation result and the target characteristic diagram to obtain the channel attention distribution diagram.

Further, fusing the normalized calculation result and the target feature map includes: and performing dot product calculation on the normalization calculation result and the target characteristic graph to obtain the channel attention distribution graph.

In a second aspect, an embodiment of the present invention provides an image processing method, including: acquiring an image to be identified; extracting characteristic information of the image to be recognized through a target recognition model; wherein the target recognition model comprises a spatial location feature attention module and a channel attention module, the spatial location feature attention module and the channel attention module are arranged in at least one convolution module in the target recognition model, and the spatial location feature attention module and the channel attention module process data according to the method of any one of the first aspect; and identifying the image to be identified through the characteristic information to obtain an identification result.

In a third aspect, an embodiment of the present invention provides a data processing apparatus, including: the first acquisition unit is used for acquiring a target characteristic map of an image to be processed; the first processing unit is used for performing dimensionality reduction and relevance reassignment processing on the target feature map through a spatial position feature attention module to obtain a spatial attention result map, wherein the spatial attention result map is a feature map obtained after weighted calculation is performed on each spatial position feature of the target feature map; and the second processing unit is used for processing the space attention result graph and the target feature graph through the channel attention module to obtain a channel attention distribution graph, wherein the channel attention distribution graph represents the importance degree of each feature channel in the target feature graph.

In a fourth aspect, an embodiment of the present invention provides an image processing apparatus, including: the second acquisition unit is used for acquiring an image to be identified; the characteristic extraction unit is used for extracting the characteristic information of the image to be recognized through a target recognition model; the target recognition model comprises a spatial position feature attention module and a channel attention module, wherein the spatial position feature attention module and the channel attention module are arranged in at least one convolution module in the target recognition model and process data according to the method of any one of claims 1 to 10, and the third processing unit is used for performing recognition processing on the image to be recognized through the feature information to obtain a recognition result.

In a fifth aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of the above first aspects or the steps of the method according to the above second aspect when executing the computer program.

In a sixth aspect, an embodiment of the present invention provides a computer-readable medium having non-volatile program code executable by a processor, where the program code causes the processor to perform the steps of the method in any one of the above first aspects or the steps of the method in the above second aspect.

In the embodiment of the invention, firstly, a target characteristic diagram of an image to be processed is obtained; secondly, performing dimensionality reduction and relevance redistribution processing on the target feature map through a spatial position feature attention module to obtain a spatial attention result map; and finally, processing the space attention result graph and the target characteristic graph through a channel attention module to obtain a channel attention distribution graph. According to the description, in the application, the spatial position feature attention module considers the similar characteristics of the neighborhood position features and the phenomenon that the attention diagrams at different positions are basically similar, performs dimension reduction processing on the target feature diagram to simplify the feature diagram strategy, so that the relation between the spatial position features can be quickly obtained, then on the basis, the channel attention module is added to jointly act on the input features to obtain the final global information, the technical problem that the traditional convolutional network model cannot effectively obtain the global information and the traditional attention model has large calculation amount is solved, and the accuracy can be improved under the condition that the calculation amount is not increased.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an electronic device according to an embodiment of the invention;

FIG. 2 is a flow chart of a method of data processing according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a spatial locality feature attention module and a channel attention module according to an embodiment of the present invention;

FIG. 4 is a flow chart of a method of image processing according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an application model of an attention module in accordance with an embodiment of the invention;

FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

first, an example electronic device 100 for implementing a data processing method of an embodiment of the present invention is described with reference to fig. 1.

As shown in fig. 1, electronic device 100 includes one or more processors 102 and one or more memories 104. Optionally, the electronic device may also include an input device 106, an output device 108, and a camera 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 102 may be implemented in hardware using at least one of a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and an asic (application Specific integrated circuit), and the processor 102 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other forms of Processing units having data Processing capability and/or instruction execution capability, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The camera 110 is configured to acquire an image to be detected, where the image to be processed acquired by the camera is processed by the data processing method to obtain a channel attention distribution map, for example, the camera may capture an image (e.g., a photo, a video, etc.) desired by a user, and then process the image by the data processing method to obtain the channel attention distribution map, and the camera may further store the captured image in the memory 104 for use by other components.

Exemplarily, an exemplary electronic device for implementing the data processing method according to the embodiment of the present invention may be implemented on a mobile terminal such as a smartphone, a tablet computer, or the like.

Example 2:

in accordance with an embodiment of the present invention, there is provided an embodiment of a data processing method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that herein.

Fig. 2 is a flow chart of a data processing method according to an embodiment of the present invention, as shown in fig. 2, the method includes the steps of:

step S202, acquiring a target characteristic diagram of an image to be processed; the size of the target feature map may be C, H, and W, where C is the number of feature channels of the target feature map, and H and W are the length and width of the target feature map.

Step S204, performing dimensionality reduction and relevance reassignment processing on the target feature map through a spatial position feature attention module to obtain a spatial attention result map, wherein the spatial attention result map is a feature map obtained after weighting calculation is performed on each spatial position feature of the target feature map. In the present application, performing dimension reduction processing on the target feature map means that dimension reduction processing can be performed on the target feature based on the length and the width, for example, reducing the target feature map with the size of H × W to a feature map with the size of 1 × S; in addition, the number of feature channels of the target feature map may be reduced to 1 from C, and the dimension reduction process will be described in detail in the following embodiments.

Step S206, processing the spatial attention result graph and the target feature graph through a channel attention module to obtain a channel attention distribution graph, where the channel attention distribution graph represents the importance degree of each feature channel in the target feature graph. After the spatial attention result graph is obtained, the global information of the image to be processed can be obtained by processing the spatial attention result graph and the target feature graph through the channel attention module.

In the embodiment of the invention, firstly, a target characteristic diagram of an image to be processed is obtained; then, performing dimension reduction processing on the target feature map through a spatial position feature attention module to obtain a spatial attention result map; and finally, processing the space attention result graph and the target characteristic graph through a channel attention module to obtain a channel attention distribution graph. According to the description, in the application, the spatial position feature attention module considers the similar characteristics of the neighborhood position features and the phenomenon that the attention diagrams at different positions are basically similar, performs dimension reduction processing on the target feature diagram to simplify the feature diagram strategy, so that the relation between the spatial position features can be quickly obtained, then on the basis, the channel attention module is added to jointly act on the input features to obtain the final global information, the technical problem that the traditional convolutional network model cannot effectively obtain the global information and the traditional attention model has large calculation amount is solved, and the accuracy can be improved under the condition that the calculation amount is not increased.

As shown in fig. 3, which is a schematic structural diagram of the spatial location feature attention module and the channel attention module, as can be seen from fig. 3, the spatial location feature attention module includes: the device comprises a characteristic pyramid pooling module, a convolution module and a softmax processing module. As shown in fig. 5, the channel attention module includes: a full connection module and a sigmoid module.

In an optional embodiment, in step S204, the dimension reduction processing is performed on the target feature map to obtain a spatial attention result map, which specifically includes the following processes:

step S2041, performing dimensionality reduction processing on the target feature map to obtain a first dimensionality reduction feature map, where the size of the target feature map is C × H × W, the size of the first dimensionality reduction feature map is C × S, and C is the number of channels of the target feature map.

In the present application, it is assumed that the sizes of the target feature maps are C, H, and W, and after the dimension reduction processing is performed on the target feature map, a first dimension reduction feature map with a size of C × S is obtained. As can be seen from the above description, in the present application, the target feature map of H × W size is reduced to a feature map of 1 × S size. The basis for the dimension reduction processing of the target feature map is that the information represented by the neighborhood positions in the target feature map is similar. For example, a certain pixel in the target feature map is similar to the information represented by the pixels in its neighborhood.

According to the description, the calculation amount of the attention model can be simplified by performing the dimension reduction operation on the target feature map based on the similarity of the information represented by the neighborhood positions in the target feature map.

Optionally, in this application, the feature pyramid pooling module may perform dimension reduction on the target feature map to obtain a first dimension reduction feature map.

As can be seen from fig. 3, for the feature map of each feature channel in the target feature map, the feature pyramid pooling module performs downscaling processing on the feature map to obtain a plurality of multi-scale feature maps, then converts the multi-scale feature maps into 1 × n-dimensional features, where n corresponding to the feature maps of different scales is different, and finally splices the 1 × n-dimensional features corresponding to the multi-scale feature maps to obtain 1 × S-dimensional features, and if the number of feature channels is C, the feature pyramid pooling module processes the target feature map to obtain a C × S-sized feature map, that is, a first downscaled feature map.

Step S2042, performing dimension reduction processing on the first dimension reduction feature map in the dimension of the number of channels to obtain a second dimension reduction feature map, where the number of channels of the second dimension reduction feature map is smaller than the number of channels of the first dimension reduction feature map.

After performing dimension reduction processing on the target feature map to obtain a first dimension reduction feature map, dimension reduction can be performed on the dimension-reduced C and S features (i.e., the first dimension reduction feature map) in the C dimension through a convolution operation of 1 × 1 to obtain a second dimension reduction feature map, where the size of the second dimension reduction feature map is 1, S. And C is a positive integer less than 1, namely, after the first dimension reduction feature map is subjected to dimension reduction treatment, the number of channels of the first dimension reduction feature map is subjected to dimension reduction treatment, and a second dimension reduction feature map is obtained.

It should be noted that, in the present application, the theoretical basis of the operation of performing the dimension reduction processing on the first dimension reduction feature map is that the attention maps of different spatial position features are substantially similar. The first dimension reduction feature map comprises C × S spatial position features, attention maps corresponding to the spatial position features are basically similar, and dimension reduction processing can be performed on the first dimension reduction feature map based on the attention maps, so that a second dimension reduction feature map with the size of 1, S is obtained.

As can be seen from the above description, the attention maps based on different spatial location features are substantially similar, and the way of performing the dimension reduction operation on the first dimension reduction feature map can further simplify the amount of calculation of the attention model based on the above step S2041.

Optionally, in this application, the convolution module performs dimensionality reduction processing on the first dimensionality reduction feature map in the dimensionality of the number of channels to obtain a second dimensionality reduction feature map, where the number of channels of the second dimensionality reduction feature map is smaller than the number of channels of the first dimensionality reduction feature map.

As shown in fig. 3, after the feature map with the size of C × S (i.e., the first dimension-reduced feature map) is obtained, dimension reduction processing may be performed on the first dimension-reduced feature map through a convolution module of 1 × 1, so as to obtain a second dimension-reduced feature map with the size of 1 × S.

Step S2043, operating a second dimension reduction feature map to obtain a weight value matrix with the re-distributed association degree of the first dimension reduction feature map, where the weight value matrix includes weight values of each spatial position feature in each feature channel of the target feature map.

Optionally, in this application, the softmax processing module may operate on the second dimension reduction feature map to obtain a weight value matrix to which the association degree of the first dimension reduction feature map is redistributed.

As shown in fig. 3, after performing dimension reduction processing on the first dimension reduction feature map to obtain a second dimension reduction feature map, weights of features at different positions in space, that is, a weight value matrix, may be obtained through a softmax processing module.

And step S2044, fusing the weight value and the first dimension reduction feature map to obtain the space attention result map.

Optionally, in this application, a product between the weight value matrix and the first dimension reduction feature map is calculated, and a product calculation result is determined as the spatial attention result map.

As shown in fig. 3, after obtaining the weight value matrix, a feature of a spatial location feature weighting is finally obtained through a matrix multiplication operation, that is, the weight value matrix and the first dimension reduction feature map are subjected to a matrix multiplication operation, so as to obtain a spatial attention result map. And carrying half of global information of the target characteristic diagram in the calculated space attention result diagram.

In the present application, after the dimension reduction processing is performed on the target feature map in the manner described in the above step S2041 to step S2044 to obtain the spatial attention result map, the channel attention module may process the spatial attention result map and the target feature map to obtain the channel attention distribution map.

In an alternative embodiment, in step S206, the processing the spatial attention result map and the target feature map by the channel attention module to obtain the channel attention distribution map includes the following steps:

step S2061, performing full-connection calculation on the space attention result graph through the full-connection module to obtain a full-connection calculation result;

step S2062, carrying out normalization calculation on the full-connection calculation result through the sigmoid module to obtain a normalization calculation result;

and S2063, fusing the normalization calculation result and the target characteristic diagram to obtain the channel attention distribution diagram. Specifically, the normalized calculation result and the target feature map may be subjected to dot product calculation to obtain the channel attention distribution map.

As shown in fig. 3, the significance of the feature maps obtained by different kernel is different and is not distinguished in the original convolutional neural network. All feature maps are equally important, on the basis of which, in the present application, the goal of different choices for different channel features is achieved by the co-action of an fc layer (i.e. fully connected module) and a sigmoid module. Wherein, different choices for different channel characteristics refer to: it is determined on which part of the channels the extracted features are critical and which channels the features are critical.

According to the description, in the application, the spatial position feature attention module considers the similar characteristics of the neighborhood position features and the phenomenon that the attention diagrams at different positions are basically similar, performs dimension reduction processing on the target feature diagram to simplify the feature diagram strategy, so that the relation between the spatial position features can be quickly obtained, then on the basis, the channel attention module is added to jointly act on the input features to obtain the final global information, the technical problem that the traditional convolutional network model cannot effectively obtain the global information and the traditional attention model has large calculation amount is solved, and the accuracy can be improved under the condition that the calculation amount is not increased.

Example 3:

fig. 4 is a flowchart of an image processing method according to an embodiment of the present invention, as shown in fig. 4, the method including the steps of:

step S402, acquiring an image to be identified;

step S404, extracting characteristic information of the image to be recognized through a target recognition model; wherein the target recognition model comprises a spatial location feature attention module and a channel attention module, the spatial location feature attention module and the channel attention module are arranged in at least one convolution module in the target recognition model, and the spatial location feature attention module and the channel attention module process data according to the method of any one of the above embodiments 2,

and step S406, identifying the image to be identified through the characteristic information to obtain an identification result.

It should be noted that, in the present application, the target recognition model may be trained in the following manner:

taking a mobility network as an example, the attention module (GAM) proposed by the present invention can be described as follows:

s1, the mobile facenet network model is based on the bottleeck structure;

s2, adding a GAM module at the last layer of each bottleeck structure, wherein the position of GAM in the diagram bottleeck is an attention module, and the attention module comprises: a spatial location feature attention module and a channel attention module.

And S3, determining a training test set of the target recognition model, training the model of the target recognition model on an imageNet image data set, and testing the performance of the model on a megaface data set.

S4, after a set of training parameters is designed, training of the target recognition model can be started, and after each n-round iteration number, the model result is tested once.

S5, finding the weight parameter best used as the model in the test result.

In the method, an image to be recognized is firstly acquired, then characteristic information of the image to be recognized is extracted through a target recognition model, and finally the image to be recognized is recognized through the characteristic information to obtain a recognition result. According to the description, in the application, the spatial position feature attention module considers the similar characteristics of the features of the neighborhood positions and the phenomenon that the images of the attention maps of different positions are basically similar, the dimension reduction processing is carried out on the target feature map to simplify the feature map strategy, so that the relation between the features of the spatial positions can be quickly obtained, then on the basis, the channel attention module is added to jointly act on the input features to obtain the final global information, and the technical problem that the traditional convolutional network model cannot effectively obtain the global information and the traditional attention model is large in calculation amount is solved. After embedding the spatial position feature attention module and the channel attention module in embodiment 2 described above into the target recognition model, a technical effect that accuracy can be improved without increasing the amount of calculation can be achieved and can be achieved.

Example 4:

the embodiment of the present invention further provides a data processing apparatus, which is mainly used for executing the data processing method provided by the foregoing content of the embodiment of the present invention, and the data processing apparatus provided by the embodiment of the present invention is specifically described below.

Fig. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention, as shown in fig. 6, the data processing apparatus mainly includes a first obtaining unit 61, a first processing unit 62, and a second processing unit 63, wherein:

a first obtaining unit 61, configured to obtain a target feature map of an image to be processed;

the first processing unit 62 is configured to perform dimensionality reduction and relevance reallocation processing on the target feature map through a spatial location feature attention module to obtain a spatial attention result map, where the spatial attention result map is a feature map obtained after performing weighting calculation on each spatial location feature of the target feature map;

a second processing unit 63, configured to process the spatial attention result map and the target feature map through a channel attention module to obtain a channel attention distribution map, where the channel attention distribution map represents an importance degree of each feature channel in the target feature map.

Optionally, the first processing unit is configured to: performing dimensionality reduction on the target feature map to obtain a first dimensionality reduction feature map, wherein the size of the target feature map is C H W, the size of the first dimensionality reduction feature map is C S, and C is the channel number of the target feature map; performing dimensionality reduction processing on the first dimensionality reduction feature map on the dimensionality of the number of the channels to obtain a second dimensionality reduction feature map; operating a second dimension reduction characteristic diagram to obtain a weight value matrix redistributed by the association degree of the first dimension reduction characteristic diagram, wherein the weight value matrix comprises the weight value of each space position characteristic in each characteristic channel of the target characteristic diagram; and fusing the weight value and the first dimension reduction feature map to obtain the space attention result map.

Optionally, the spatial location feature attention module comprises: the device comprises a characteristic pyramid pooling module, a convolution module and a softmax processing module.

Optionally, the first processing unit is further configured to: and performing dimension reduction processing on the target feature map through the feature pyramid pooling module to obtain a first dimension reduction feature map.

Optionally, the first processing unit is further configured to: and performing dimensionality reduction processing on the first dimensionality reduction feature map through the convolution module in the dimensionality of the channel number to obtain a second dimensionality reduction feature map, wherein the channel number of the second dimensionality reduction feature map is smaller than the channel number of the first dimensionality reduction feature map.

Optionally, the first processing unit is further configured to: and operating the second dimension reduction characteristic diagram through the softmax processing module to obtain a weight value matrix redistributed by the association degree of the first dimension reduction characteristic diagram.

Optionally, the first processing unit is further configured to: and calculating a product between the weight value matrix and the first dimension reduction feature map, and determining a product calculation result as the space attention result map.

Optionally, the channel attention module comprises: a full connection module and a sigmoid module.

Optionally, the second processing unit is configured to: performing full-connection calculation on the space attention result graph through the full-connection module to obtain a full-connection calculation result; carrying out normalization calculation on the full-connection calculation result through the sigmoid module to obtain a normalization calculation result; and fusing the normalization calculation result and the target characteristic diagram to obtain the channel attention distribution diagram.

Optionally, the second processing unit is further configured to: and performing dot product calculation on the normalization calculation result and the target characteristic graph to obtain the channel attention distribution graph.

Example 5:

an embodiment of the present invention further provides an image processing apparatus, which is mainly used for executing the image processing method provided by the foregoing content of the embodiment of the present invention, and the image processing apparatus provided by the embodiment of the present invention is specifically described below.

Fig. 7 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention, which mainly includes, as shown in fig. 7, a second acquisition unit 71, a feature extraction unit 72, and a third processing unit 73, wherein:

a second acquiring unit 71 configured to acquire an image to be recognized;

a feature extraction unit 72, configured to extract feature information of the image to be recognized through a target recognition model; the target recognition model comprises a spatial position feature attention module and a channel attention module, wherein the spatial position feature attention module and the channel attention module are arranged in at least one convolution module in the target recognition model, and the spatial position feature attention module and the channel attention module process data according to the method of any one of the embodiment 2;

and the third processing unit 73 is configured to perform recognition processing on the image to be recognized through the feature information to obtain a recognition result.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A data processing method, comprising:

acquiring a target characteristic diagram of an image to be processed;

performing dimensionality reduction and relevance redistribution processing on the target feature map through a spatial position feature attention module to obtain a spatial attention result map, wherein the spatial attention result map is a feature map obtained after weighting calculation is performed on each spatial position feature of the target feature map;

and processing the space attention result graph and the target feature graph through a channel attention module to obtain a channel attention distribution graph, wherein the channel attention distribution graph represents the importance degree of each feature channel in the target feature graph.

2. The method of claim 1, wherein performing de-clustering and relevancy re-assignment on the target feature map to obtain a spatial attention result map comprises:

performing dimensionality reduction on the target feature map to obtain a first dimensionality reduction feature map, wherein the size of the target feature map is C H W, the size of the first dimensionality reduction feature map is C S, and C is the channel number of the target feature map;

performing dimensionality reduction processing on the first dimensionality reduction feature map on the dimensionality of the number of the channels to obtain a second dimensionality reduction feature map;

operating a second dimension reduction characteristic diagram to obtain a weight value matrix redistributed by the association degree of the first dimension reduction characteristic diagram, wherein the weight value matrix comprises the weight value of each space position characteristic in each characteristic channel of the target characteristic diagram;

and fusing the weight value and the first dimension reduction feature map to obtain the space attention result map.

3. The method of claim 2, wherein the spatial locality feature attention module comprises: the device comprises a characteristic pyramid pooling module, a convolution module and a softmax processing module.

4. The method according to claim 3, wherein performing dimension reduction processing on the target feature map to obtain a first dimension reduction feature map comprises:

and performing dimension reduction processing on the target feature map through the feature pyramid pooling module to obtain a first dimension reduction feature map.

5. The method of claim 3, wherein performing dimension reduction on the first dimension reduction feature map in the dimension of the number of channels to obtain a second dimension reduction feature map comprises:

and performing dimensionality reduction processing on the first dimensionality reduction feature map through the convolution module in the dimensionality of the channel number to obtain a second dimensionality reduction feature map, wherein the channel number of the second dimensionality reduction feature map is smaller than the channel number of the first dimensionality reduction feature map.

6. The method of claim 3, wherein operating on the second dimension-reduced feature map to obtain the weight value matrix with the reassigned association degree of the first dimension-reduced feature map comprises:

and operating the second dimension reduction characteristic diagram through the softmax processing module to obtain a weight value matrix redistributed by the association degree of the first dimension reduction characteristic diagram.

7. The method of claim 3, wherein fusing the weight value and the first dimension-reduced feature map to obtain the spatial attention result map comprises:

and calculating a product between the weight value matrix and the first dimension reduction feature map, and determining a product calculation result as the space attention result map.

8. The method of claim 1, wherein the channel attention module comprises: a full connection module and a sigmoid module.

9. The method of claim 8, wherein processing the spatial attention result map and the target feature map by a channel attention module to obtain a channel attention distribution map comprises:

performing full-connection calculation on the space attention result graph through the full-connection module to obtain a full-connection calculation result;

carrying out normalization calculation on the full-connection calculation result through the sigmoid module to obtain a normalization calculation result;

and fusing the normalization calculation result and the target characteristic diagram to obtain the channel attention distribution diagram.

10. The method of claim 9, wherein fusing the normalized computation result and the target feature map comprises:

and performing dot product calculation on the normalization calculation result and the target characteristic graph to obtain the channel attention distribution graph.

11. An image processing method, comprising:

acquiring an image to be identified;

extracting characteristic information of the image to be recognized through a target recognition model; wherein the object recognition model comprises a spatial locality feature attention module and a channel attention module, the spatial locality feature attention module and the channel attention module being arranged in at least one convolution module in the object recognition model, and the spatial locality feature attention module and the channel attention module processing data according to the method of any of the preceding claims 1 to 10,

and identifying the image to be identified through the characteristic information to obtain an identification result.

12. A data processing apparatus, comprising:

the first acquisition unit is used for acquiring a target characteristic map of an image to be processed;

the first processing unit is used for performing dimensionality reduction and relevance reassignment processing on the target feature map through a spatial position feature attention module to obtain a spatial attention result map, wherein the spatial attention result map is a feature map obtained after weighted calculation is performed on each spatial position feature of the target feature map;

and the second processing unit is used for processing the space attention result graph and the target feature graph through the channel attention module to obtain a channel attention distribution graph, wherein the channel attention distribution graph represents the importance degree of each feature channel in the target feature graph.

13. An image processing apparatus characterized by comprising:

the second acquisition unit is used for acquiring an image to be identified;

the characteristic extraction unit is used for extracting the characteristic information of the image to be recognized through a target recognition model; wherein the object recognition model comprises a spatial location feature attention module and a channel attention module, the spatial location feature attention module and the channel attention module are arranged in at least one convolution module in the object recognition model, and the spatial location feature attention module and the channel attention module process data according to the method of any one of claims 1 to 10;

and the third processing unit is used for carrying out identification processing on the image to be identified through the characteristic information to obtain an identification result.

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of the preceding claims 1 to 10 are implemented by the processor when executing the computer program, or the steps of the method of claim 11.

15. A computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the steps of the method of any of the preceding claims 1 to 10 or the steps of the method of claim 11.