CN111210432A - Image semantic segmentation method based on multi-scale and multi-level attention mechanism - Google Patents
Image semantic segmentation method based on multi-scale and multi-level attention mechanism Download PDFInfo
- Publication number
- CN111210432A CN111210432A CN202010030667.1A CN202010030667A CN111210432A CN 111210432 A CN111210432 A CN 111210432A CN 202010030667 A CN202010030667 A CN 202010030667A CN 111210432 A CN111210432 A CN 111210432A
- Authority
- CN
- China
- Prior art keywords
- image
- follows
- attention mechanism
- feature
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 35
- 230000011218 segmentation Effects 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000013528 artificial neural network Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000004927 fusion Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 20
- 230000009467 reduction Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 claims description 2
- 239000004576 sand Substances 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 2
- 238000003062 neural network model Methods 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/60—Rotation of whole images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image semantic segmentation method based on a multi-scale and multi-level attention mechanism. The invention comprises the following steps: 1. and carrying out data preprocessing on the image and the real label image. 2. And establishing a neural network structure of the multi-scale attention mechanism model, and extracting and fusing image features. 3. And establishing a neural network structure of the multi-level attention mechanism model, and performing feature fusion of the multi-level images. 4. And (4) model training, namely training neural network parameters by using a back propagation algorithm until the network converges. The invention relates to a neural network model for image semantic segmentation, in particular to a unified modeling method for extracting self attention information of an image on multiple scales and a network structure for fusing image features of different levels on a multi-level, and a better segmentation effect in the field of semantic segmentation is obtained.
Description
Technical Field
The invention belongs to the technical field of computer vision, relates to a deep neural network model for image semantic segmentation, and particularly relates to a method for uniformly modeling image feature data and a method for learning relevance among pixel points on image features so as to establish a deep model for image semantic segmentation.
Background
The image semantic segmentation technology is that a machine automatically segments and identifies the content of an image. Semantic segmentation of 2D images, videos, and even 3D data is a key issue in the field of computer vision. Semantic segmentation is a highly difficult task aimed at scene understanding. Scene understanding, as a core problem of computer vision, is particularly important today when the number of applications for extracting knowledge from images is dramatically enhanced. These applications include: autopilot, human-computer interaction, computer photography, image search engines, and augmented reality. These problems have been solved in the past using a variety of computer vision and machine learning methods. Despite the popularity of these approaches, deep learning changes this situation and many computer vision problems, including semantic segmentation, are being addressed by the deep framework. Typically a deep convolutional neural network, which can significantly improve accuracy and efficiency. Deep learning is then far less sophisticated than machine learning and other branches of computer vision. In view of this, there is still a lot of research space for semantic segmentation of images under the deep learning framework.
With the rapid development of deep learning in recent years, end-to-end problem modeling using a deep Neural network (CNN) and a full Convolutional Neural network (FCN) has become a mainstream research method in the computer visual direction. In the image semantic segmentation algorithm, the idea of end-to-end modeling is introduced, meanwhile, the end-to-end modeling is carried out on the characteristic image by using a proper network structure, and the problem of directly inputting the predicted semantic image is a problem worthy of deep discussion.
Because the content of the image in a natural scene is complex and the main body is various, semantic analysis on the image pixel by pixel is too laborious and inefficient, finding the relation between the middle pixel points of the characteristic image is a cut-in of several key difficulties of the task.
In summary, it is necessary to introduce attention learning (connection between pixel points) into an image semantic segmentation method based on end-to-end modeling, which is a direction worth of deep research.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an image semantic segmentation method based on a multi-scale and multi-level attention mechanism.
The technical scheme adopted by the invention for solving the technical problems is as follows:
given an image I, the corresponding real label map Gt constitutes a training set.
Step (1), preprocessing a data set, and extracting the characteristics of image data
Preprocessing an image I: firstly, horizontally rotating the image I, randomly scaling the size, cutting the image I into uniform size, and extracting the features of the image by using a full convolution neural network to obtain the image If1、If2、If3And If4。
Step (2), establishing a multi-scale attention mechanism model (MSM) and further extracting characteristics
Input image characteristics If4Zooming in different degrees is carried out on the image through bilinear interpolation, and finally, channel fusion is carried out to obtain a characteristic image I with specified dimensionalityf4_att。
Step (3), establishing a multi-stage attention mechanism model (MCM) for feature fusion
Input image characteristics If1、If2And If4_attThe provided multi-stage attention mechanism model is used for effectively fusing the three characteristics to obtain a characteristic diagram I with strong characteristic information and good robustnessF。
Step (4), model training
Input feature map IF、If2And (3) carrying out spatial cross entropy calculation with the real label graph Gt to obtain the difference with a real solution, and training the model parameters of the full convolution neural network defined in the step (2) and the step (3) by using a back propagation algorithm until the whole network model can be converged.
The data preprocessing and the image feature extraction in the step (1) are carried out:
extracting the features of the image I, and extracting the image features by using the conventional full convolution neural network (FCN) to form the image features If1、If2、If3And If4WhereinAndwhereinWhere c is the number of channels of the image feature and h and w are the height and width of the image feature, respectively.
The multi-scale attention mechanism model (MSM) for image semantic segmentation in the step (2) is used for feature fusion, and the specific formula is as follows:
2-1. forExtracting characteristic information on different scales, wherein the specific formula is as follows:
x=Conv(If4) (1)
xs=Attention(bilinear interpolation(x,size(s));s=1,2,3,4;size=[48,32,16,8](2)
Ys=Concat(bilinear interpolation(xs,64),If4) (3)
where Conv is a 1 × 1 convolution, for If4Reducing the dimension of the channel; the bilinear interpolation function refers to feature amplification by bilinear interpolationShrinking; the Concat function refers to the feature performing the splicing operation. The Attention function is specifically disclosed as follows:
for the Attention function input feature image x, the specific formula is as follows:
xquery=Conv(x);xkey=Conv(x);xvalue=Conv(x) (4)
xcontext=xt value×xattention(6)
xout=μ×xcontext+x (7)
2-2, performing dimension reduction on the Concat output result, and extracting characteristic information, wherein the specific formula is as follows:
If4_att=Conv(Ys) (8)
where Conv is a 1 × 1 convolution for YsReducing the dimension of the channel;
the multi-stage attention mechanism model (MCM) for image semantic segmentation in the step (3) specifically comprises the following steps:
firstly, a multi-level attention mechanism model for image semantic segmentation is described, and the model is specifically realized as follows:
inputting low-order characteristic image x to multi-stage attention mechanism modellAnd higher order feature image xhThe concrete formula is as follows:
3-1, carrying out unified dimension and size operation on the two input feature graphs:
xl=Conv(xl) (9)
xh=bilinear interpolation(xh,size(xl)) (10)
where the Conv function is a 1 × 1 convolution for xlPerforming channel dimensionality reduction; blinThe ear interpolation function is a bilinear interpolation pair xhPerforming size enlargement to obtain a product of xlUniform size.
3-2, carrying out splicing and normalization operation on the two characteristic images with the same dimensionality to obtain attention information:
xlh=Concat(xl,xh) (11)
xatt=Softmax(Normalize(GAP(xlh))) (12)
wherein GAP is global average pooling, and Softmax formula is as follows
3-3, performing Hadamardproduct operation on the attention information image and the low-order characteristic image, wherein the specific formula is as follows:
and 3-4, performing summation operation on the Hadamardroduct input and the high-order characteristic image, wherein the specific formula is as follows:
Fa=fa+xh(15)
then sequentially adding If4_att、If2And If1The method is input into a multi-stage attention mechanism model, and the specific formula is as follows:
IF=MCM(If4_att,If2) (16)
IF=MCM(IF,If1) (17)
wherein the MCM function refers to a multi-stage attention mechanism model.
The training model in the step (4) is as follows:
the predicted image I generated in the step (3) is processedFThe characteristic image I generated in the step (1)f3And inputting the real tag graph Gt into a defined Loss function CrossEntropyLoss to obtain a Loss value Loss, wherein the Loss value Loss is specifically disclosed as follows:
Loss=CrossEntropyLoss(IF,If3,Gt) (18)
wherein the formula of Cross EntropyLoss is as follows:
Loss=L1+λ×L2(21)
wherein B refers to the number of images input into the neural network, C refers to the number of channels of the characteristic images, and λ refers to the weight values of the two loss functions.
And adjusting parameters in the network by using a back propagation algorithm according to the calculated Loss value Loss.
The invention has the following beneficial effects:
compared with other methods, the method provided by the invention has relatively better performance in precision aiming at the problem of image semantic segmentation: firstly, the parameter quantity of the model is greatly reduced, the overfitting of the model is effectively prevented, and the training time of the model is reduced; second, it is simpler and easier to implement than other models. According to the invention, an attention mechanism is introduced into the end-to-end-based full convolution neural network, and image features are extracted at multiple scales and multiple levels, so that a better effect in an image semantic segmentation task is obtained.
Drawings
Fig. 1 is a general structural view of the present invention.
FIG. 2 is a multi-scale attention mechanism model of the present invention.
FIG. 3 is a multi-stage attention mechanism model of the present invention.
Fig. 4 is a visualization result of the model experiment of the present invention.
Detailed Description
In order to make the purpose and technical solution of the present invention more clearly understood, the following detailed description is made with reference to the accompanying drawings and examples, and the application principle of the present invention is described in detail.
As shown in fig. 1, fig. 2 and fig. 3, the present invention provides a deep neural network structure for Image semantic segmentation (Image semantic segmentation), which comprises the following specific steps:
the data preprocessing and the feature extraction of the image in the step (1) are specifically as follows:
the Pascal VOC2012 data set is used here as training and testing data.
For image data, the image features are extracted here using the existing 101-layer depth residual network (Resnet-101) model. Specifically, we uniformly scale the image data to 513 × 513 and input it into the depth residual network, and extract the output of res2c layer therein as the image featureExtracting the output of res3c layer as image featureExtracting the output of res4c layer as image featureExtracting the output of res5c layer as image feature
The multi-scale attention mechanism model (MSM) in the step (2) fuses image features, and the method specifically comprises the following steps:
2-1 for If4Extraction of feature information at different scales is performed. First using convolution operation pair If4And performing dimension reduction operation to 512 channels.
2-2, carrying out bilinear interpolation operation on the dimension reduction output result to obtain characteristic images x with the dimensions of 48,32,16 and 8 respectivelys。
And 2-3, performing Attention operation on the feature images with 4 scales, extracting the relevance among pixel points, and then outputting a result by sampling the Attention through bilinear interpolation. Wherein the Attention operation has the following specific formula:
for the Attention function input feature image x, the specific formula is as follows:
xquery=Conv(x);xkey=Conv(x);xvalue=Conv(x) (22)
xcontext=xt value×xattention(24)
xout=μ×xcontext+x (25)
2-4, outputting the result and I of 4 multi-scale attentionsf4Carrying out splicing and dimensionality reduction operation to obtain a characteristic image I with attention informationf4_att。
The relevant operation of the multi-scale attention mechanism model is completed.
Fusing the image characteristics by the multi-stage attention mechanism model (MCM) in the step (3), which comprises the following specific steps:
3-1. for input features If4_attAnd If2Unification in dimension and scale is performed.
3-2, splicing the two characteristic images with uniform dimension, and sequentially carrying out global average pooling, regularization and normalization on the spliced output result to obtain a characteristic image x with attention informationatt。
3-3. image x of attention informationattAnd a low-order feature image If2F is obtained by Hadamardroduct operationa。
3-4, outputting result f for HadamardproductaAnd a high-order characteristic image If3_attDoing a summation operation to obtain Fa。
3-5, mixing If1As a low-order feature image, FaPerforming the above operations 3-1 to 3-4 as high-order characteristic images to obtain final imagesOutput characteristic image IF。
Thus, the multi-stage attention mechanism model operation is completed.
The training model in the step (4) is specifically as follows:
for the prediction characteristic image generated in the step (3)And the characteristic image generated in the step (1)An upsample operation is performed to the original size 513 × 513 and the dimensions are reduced to the number of classes of the Pascal VOC2012 data set by a convolution operation (21). Comparing the loss value with a real tag graph Gt of a data set, calculating to obtain the difference between a predicted value and an actual correct value through a defined loss function Cross EntropyLoss and forming a loss value, and then adjusting the parameter value of the whole network by using a Back-Propagation (BP) algorithm according to the loss value until the network converges.
The following table shows the accuracy of the process of the invention in Pascal VOC 2012. Our is the depth model proposed by the invention, aero, bike represents the class object to be semantically segmented in the data set, and mIoU represents the average accuracy of all classes on the semantic segmentation task.
Claims (4)
1. An image semantic segmentation method based on a multi-scale and multi-level attention mechanism is characterized by comprising the following steps:
given an image I, the corresponding real label map Gt, constitutes a training set:
step (1): data set preprocessing, feature extraction of image data
Preprocessing an image I: firstly, horizontally rotating the image I, randomly scaling the size, cutting the image I into uniform size, and extracting the features of the image by using a full convolution neural network to obtain the image If1、If2、If3And If4。
Step (2): establishing a multi-scale attention mechanism model (MSM) and further extracting characteristics
Input image characteristics If4Zooming in different degrees is carried out on the image through bilinear interpolation, and finally, channel fusion is carried out to obtain image characteristics I with specified dimensionalityf4_att。
And (3): establishing a multi-level attention mechanism model (MCM) for feature fusion
Input image characteristics If1、If2And If4_attEffectively fusing the three characteristics by using the proposed multi-stage attention mechanism model to obtain a characteristic diagram I with strong characteristic information and good robustnessF。
And (4): model training
Input feature map IF、If2And (3) carrying out spatial cross entropy calculation with the real label graph Gt to obtain the difference with a real solution, and training the model parameters of the full convolution neural network defined in the step (2) and the step (3) by using a back propagation algorithm until the whole network model can be converged.
2. The image semantic segmentation method based on the multi-scale and multi-level attention mechanism according to claim 1, characterized in that the image preprocessing of step (1) and the feature fusion of the multi-scale attention mechanism model (MSM) of step (2) are as follows:
2-1, extracting the features of the image I, and extracting the image features by using the existing full convolution neural network (FCN) to form the image features If1、If2、If3And If4Which isAndwhereinWhere c is the number of channels of the image feature and h and w are the height and width of the image feature, respectively.
2-2 for If4Extracting characteristic information on different scales, wherein a specific formula is as follows:
x=Conv(If4) (1)
xs=Attention(bilinear interpolation(x,size(s));s=1,2,3,4;size=[48,32,16,8](2)
Ys=Concat(bilinear interpolation(xs,64),If4) (3)
where Conv is a 1 × 1 convolution, for If4Reducing the dimension of the channel; the bilinear interpolation function refers to feature scaling by bilinear interpolation; the Concat function refers to the splicing operation of the feature images. The Attention function is specifically disclosed as follows:
for the Attention function input feature image x, the specific formula is as follows:
xquery=Conv(x);xkey=Conv(x);xvalue=Conv(x) (4)
xcontext=xt value×xattention(6)
xout=μ×xcontext+x (7)
2-3, reducing the dimension of the Concat output result, and extracting characteristic information, wherein the specific formula is as follows:
If4_att=Conv(Ys) (8)
where Conv is a 1 × 1 convolution for YsAnd (5) reducing the dimension of the channel.
3. The image semantic segmentation method based on the multi-scale multi-stage attention mechanism as claimed in claim 1, wherein the multi-stage attention mechanism model (MCM) for image semantic segmentation in step (3) is specifically as follows:
firstly, the specific implementation of the multi-level attention mechanism model for image semantic segmentation is described as follows:
inputting low-order characteristic image x to multi-stage attention mechanism modellAnd higher order feature image xhThe concrete formula is as follows:
3-1, carrying out unified dimension and size operation on the two input feature graphs:
xl=Conv(xl) (9)
xh=bilinear interpolation(xh,size(xl)) (10)
where the Conv function is a 1 × 1 convolution for xlPerforming channel dimensionality reduction; the bilinear interpolation function is a bilinear interpolation pair xhPerforming size enlargement to obtain a product of xlUniform size.
3-2, carrying out splicing and normalization operation on the two characteristic images with the same dimensionality to obtain attention information:
xlh=Concat(xl,xh) (11)
xatt=Softmax(Normalize(GAP(xlh))) (12)
wherein GAP is global average pooling, and Softmax formula is as follows:
3-3. image x of attention informationattAnd low-order feature image xlThe Hadamardproduct operation is carried out, and the specific formula is as follows:
3-4, outputting the result and the high-order characteristic image x to the HadamardroducthAnd (3) performing summation operation, wherein the specific formula is as follows:
Fa=fa+xh(15)
then sequentially adding If4_att、If2And If1The input is input into a multi-stage attention mechanism model (MCM), and the specific formula is as follows:
IF=MCM(If4_att,If2) (16)
IF=MCM(IF,If1) (17)
wherein the MCM function refers to a multi-stage attention mechanism model.
4. The image semantic segmentation method based on the multi-scale and multi-level attention mechanism according to claim 1, wherein the training model in the step (4) is as follows:
the predicted image I generated in the step (3) is processedFThe characteristic image I generated in the step (1)f3And inputting the real tag graph Gt into a defined Loss function CrossEntropyLoss to obtain a Loss value Loss, wherein the Loss value Loss is specifically disclosed as follows:
Loss=CrossEntropyLoss(IF,If3,Gt) (18)
wherein the formula of Cross EntropyLoss is as follows:
Loss=L1+λ×L2(21)
wherein B refers to the number of images input into the neural network, C refers to the number of channels of the characteristic images, and λ refers to the weight values of the two loss functions.
And adjusting parameters in the network by using a back propagation algorithm according to the calculated Loss value Loss.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010030667.1A CN111210432B (en) | 2020-01-12 | 2020-01-12 | Image semantic segmentation method based on multi-scale multi-level attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010030667.1A CN111210432B (en) | 2020-01-12 | 2020-01-12 | Image semantic segmentation method based on multi-scale multi-level attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111210432A true CN111210432A (en) | 2020-05-29 |
CN111210432B CN111210432B (en) | 2023-07-25 |
Family
ID=70786703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010030667.1A Active CN111210432B (en) | 2020-01-12 | 2020-01-12 | Image semantic segmentation method based on multi-scale multi-level attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111210432B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111667495A (en) * | 2020-06-08 | 2020-09-15 | 北京环境特性研究所 | Image scene analysis method and device |
CN111860517A (en) * | 2020-06-28 | 2020-10-30 | 广东石油化工学院 | Semantic segmentation method under small sample based on decentralized attention network |
CN112233129A (en) * | 2020-10-20 | 2021-01-15 | 湘潭大学 | Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device |
CN112465828A (en) * | 2020-12-15 | 2021-03-09 | 首都师范大学 | Image semantic segmentation method and device, electronic equipment and storage medium |
CN113221969A (en) * | 2021-04-25 | 2021-08-06 | 浙江师范大学 | Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion |
CN114140322A (en) * | 2021-11-19 | 2022-03-04 | 华中科技大学 | Attention-guided interpolation method and low-delay semantic segmentation method |
CN114677380A (en) * | 2022-03-25 | 2022-06-28 | 西安交通大学 | Video object segmentation method and system based on diversified interaction |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018153322A1 (en) * | 2017-02-23 | 2018-08-30 | 北京市商汤科技开发有限公司 | Key point detection method, neural network training method, apparatus and electronic device |
CN110163878A (en) * | 2019-05-28 | 2019-08-23 | 四川智盈科技有限公司 | A kind of image, semantic dividing method based on dual multiple dimensioned attention mechanism |
CN110188685A (en) * | 2019-05-30 | 2019-08-30 | 燕山大学 | A kind of object count method and system based on the multiple dimensioned cascade network of double attentions |
-
2020
- 2020-01-12 CN CN202010030667.1A patent/CN111210432B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018153322A1 (en) * | 2017-02-23 | 2018-08-30 | 北京市商汤科技开发有限公司 | Key point detection method, neural network training method, apparatus and electronic device |
CN110163878A (en) * | 2019-05-28 | 2019-08-23 | 四川智盈科技有限公司 | A kind of image, semantic dividing method based on dual multiple dimensioned attention mechanism |
CN110188685A (en) * | 2019-05-30 | 2019-08-30 | 燕山大学 | A kind of object count method and system based on the multiple dimensioned cascade network of double attentions |
Non-Patent Citations (2)
Title |
---|
张东波;易良玲;许海霞;张莹: "多尺度局部结构主导二值模式学习图像表示" * |
赵斐: "基于金字塔注意力机制的遥感图像语义分割" * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111667495A (en) * | 2020-06-08 | 2020-09-15 | 北京环境特性研究所 | Image scene analysis method and device |
CN111860517A (en) * | 2020-06-28 | 2020-10-30 | 广东石油化工学院 | Semantic segmentation method under small sample based on decentralized attention network |
CN111860517B (en) * | 2020-06-28 | 2023-07-25 | 广东石油化工学院 | Semantic segmentation method under small sample based on distraction network |
CN112233129A (en) * | 2020-10-20 | 2021-01-15 | 湘潭大学 | Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device |
CN112465828A (en) * | 2020-12-15 | 2021-03-09 | 首都师范大学 | Image semantic segmentation method and device, electronic equipment and storage medium |
CN112465828B (en) * | 2020-12-15 | 2024-05-31 | 益升益恒(北京)医学技术股份公司 | Image semantic segmentation method and device, electronic equipment and storage medium |
CN113221969A (en) * | 2021-04-25 | 2021-08-06 | 浙江师范大学 | Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion |
WO2022227913A1 (en) * | 2021-04-25 | 2022-11-03 | 浙江师范大学 | Double-feature fusion semantic segmentation system and method based on internet of things perception |
CN114140322A (en) * | 2021-11-19 | 2022-03-04 | 华中科技大学 | Attention-guided interpolation method and low-delay semantic segmentation method |
CN114140322B (en) * | 2021-11-19 | 2024-07-05 | 华中科技大学 | Attention-directed interpolation method and low-latency semantic segmentation method |
CN114677380A (en) * | 2022-03-25 | 2022-06-28 | 西安交通大学 | Video object segmentation method and system based on diversified interaction |
Also Published As
Publication number | Publication date |
---|---|
CN111210432B (en) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111210432A (en) | Image semantic segmentation method based on multi-scale and multi-level attention mechanism | |
CN111858954B (en) | Task-oriented text-generated image network model | |
US11328172B2 (en) | Method for fine-grained sketch-based scene image retrieval | |
CN111079532B (en) | Video content description method based on text self-encoder | |
JP7291183B2 (en) | Methods, apparatus, devices, media, and program products for training models | |
CN112132197B (en) | Model training, image processing method, device, computer equipment and storage medium | |
CN112990116B (en) | Behavior recognition device and method based on multi-attention mechanism fusion and storage medium | |
CN111242844B (en) | Image processing method, device, server and storage medium | |
US20220270384A1 (en) | Method for training adversarial network model, method for building character library, electronic device, and storage medium | |
CN117033609B (en) | Text visual question-answering method, device, computer equipment and storage medium | |
CN113837290A (en) | Unsupervised unpaired image translation method based on attention generator network | |
Yang et al. | Xception-based general forensic method on small-size images | |
CN114299305B (en) | Saliency target detection algorithm for aggregating dense and attention multi-scale features | |
CN115482387A (en) | Weak supervision image semantic segmentation method and system based on multi-scale class prototype | |
CN110633706A (en) | Semantic segmentation method based on pyramid network | |
EP4170547A1 (en) | Method for extracting data features, and related apparatus | |
CN110110775A (en) | A kind of matching cost calculation method based on hyper linking network | |
CN114333062A (en) | Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency | |
CN118229632A (en) | Display screen defect detection method, model training method, device, equipment and medium | |
CN117036699A (en) | Point cloud segmentation method based on Transformer neural network | |
CN110516669B (en) | Multi-level and multi-scale fusion character detection method in complex environment | |
CN114140317A (en) | Image animation method based on cascade generation confrontation network | |
CN113096176A (en) | Semantic segmentation assisted binocular vision unsupervised depth estimation method | |
CN114140667A (en) | Small sample rapid style migration method based on deep convolutional neural network | |
Zhao et al. | Adapting vision transformer for efficient change detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |