[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111210432A - An image semantic segmentation method based on multi-scale and multi-level attention mechanism - Google Patents

An image semantic segmentation method based on multi-scale and multi-level attention mechanism Download PDF

Info

Publication number
CN111210432A
CN111210432A CN202010030667.1A CN202010030667A CN111210432A CN 111210432 A CN111210432 A CN 111210432A CN 202010030667 A CN202010030667 A CN 202010030667A CN 111210432 A CN111210432 A CN 111210432A
Authority
CN
China
Prior art keywords
image
follows
attention mechanism
feature
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010030667.1A
Other languages
Chinese (zh)
Other versions
CN111210432B (en
Inventor
许海霞
黄云佳
刘用
周维
王帅龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN202010030667.1A priority Critical patent/CN111210432B/en
Publication of CN111210432A publication Critical patent/CN111210432A/en
Application granted granted Critical
Publication of CN111210432B publication Critical patent/CN111210432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于多尺度多级注意力机制的图像语义分割方法。本发明包括以下步骤:1、对图像和真实标签图进行数据预处理。2、建立多尺度注意力机制模型的神经网络结构,进行图像特征提取及融合。3、建立多级注意力机制模型的神经网络结构,进行多级图像的特征融合。4、模型训练,利用反向传播算法训练神经网络参数,直到网络收敛。本发明一种针对图像语义分割的神经网络模型,特别是提出了一种在多尺度上提取图像自身注意力信息的统一建模方法,以及在多级层面上进行不同级图像特征融合的网络结构,并且获得了在语义分割领域较好的分割效果。

Figure 202010030667

The invention discloses an image semantic segmentation method based on a multi-scale and multi-level attention mechanism. The present invention includes the following steps: 1. Data preprocessing is performed on the image and the real label map. 2. Establish the neural network structure of the multi-scale attention mechanism model, and perform image feature extraction and fusion. 3. Establish the neural network structure of the multi-level attention mechanism model, and perform feature fusion of multi-level images. 4. Model training, using the back-propagation algorithm to train the neural network parameters until the network converges. The present invention is a neural network model for image semantic segmentation, in particular, a unified modeling method for extracting image self-attention information at multiple scales, and a network structure for different-level image feature fusion at multiple levels , and obtained a better segmentation effect in the field of semantic segmentation.

Figure 202010030667

Description

Image semantic segmentation method based on multi-scale and multi-level attention mechanism
Technical Field
The invention belongs to the technical field of computer vision, relates to a deep neural network model for image semantic segmentation, and particularly relates to a method for uniformly modeling image feature data and a method for learning relevance among pixel points on image features so as to establish a deep model for image semantic segmentation.
Background
The image semantic segmentation technology is that a machine automatically segments and identifies the content of an image. Semantic segmentation of 2D images, videos, and even 3D data is a key issue in the field of computer vision. Semantic segmentation is a highly difficult task aimed at scene understanding. Scene understanding, as a core problem of computer vision, is particularly important today when the number of applications for extracting knowledge from images is dramatically enhanced. These applications include: autopilot, human-computer interaction, computer photography, image search engines, and augmented reality. These problems have been solved in the past using a variety of computer vision and machine learning methods. Despite the popularity of these approaches, deep learning changes this situation and many computer vision problems, including semantic segmentation, are being addressed by the deep framework. Typically a deep convolutional neural network, which can significantly improve accuracy and efficiency. Deep learning is then far less sophisticated than machine learning and other branches of computer vision. In view of this, there is still a lot of research space for semantic segmentation of images under the deep learning framework.
With the rapid development of deep learning in recent years, end-to-end problem modeling using a deep Neural network (CNN) and a full Convolutional Neural network (FCN) has become a mainstream research method in the computer visual direction. In the image semantic segmentation algorithm, the idea of end-to-end modeling is introduced, meanwhile, the end-to-end modeling is carried out on the characteristic image by using a proper network structure, and the problem of directly inputting the predicted semantic image is a problem worthy of deep discussion.
Because the content of the image in a natural scene is complex and the main body is various, semantic analysis on the image pixel by pixel is too laborious and inefficient, finding the relation between the middle pixel points of the characteristic image is a cut-in of several key difficulties of the task.
In summary, it is necessary to introduce attention learning (connection between pixel points) into an image semantic segmentation method based on end-to-end modeling, which is a direction worth of deep research.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an image semantic segmentation method based on a multi-scale and multi-level attention mechanism.
The technical scheme adopted by the invention for solving the technical problems is as follows:
given an image I, the corresponding real label map Gt constitutes a training set.
Step (1), preprocessing a data set, and extracting the characteristics of image data
Preprocessing an image I: firstly, horizontally rotating the image I, randomly scaling the size, cutting the image I into uniform size, and extracting the features of the image by using a full convolution neural network to obtain the image If1、If2、If3And If4
Step (2), establishing a multi-scale attention mechanism model (MSM) and further extracting characteristics
Input image characteristics If4Zooming in different degrees is carried out on the image through bilinear interpolation, and finally, channel fusion is carried out to obtain a characteristic image I with specified dimensionalityf4_att
Step (3), establishing a multi-stage attention mechanism model (MCM) for feature fusion
Input image characteristics If1、If2And If4_attThe provided multi-stage attention mechanism model is used for effectively fusing the three characteristics to obtain a characteristic diagram I with strong characteristic information and good robustnessF
Step (4), model training
Input feature map IF、If2And (3) carrying out spatial cross entropy calculation with the real label graph Gt to obtain the difference with a real solution, and training the model parameters of the full convolution neural network defined in the step (2) and the step (3) by using a back propagation algorithm until the whole network model can be converged.
The data preprocessing and the image feature extraction in the step (1) are carried out:
extracting the features of the image I, and extracting the image features by using the conventional full convolution neural network (FCN) to form the image features If1、If2、If3And If4Wherein
Figure BDA0002363573340000021
And
Figure BDA0002363573340000022
wherein
Figure BDA0002363573340000023
Where c is the number of channels of the image feature and h and w are the height and width of the image feature, respectively.
The multi-scale attention mechanism model (MSM) for image semantic segmentation in the step (2) is used for feature fusion, and the specific formula is as follows:
2-1. for
Figure BDA0002363573340000024
Extracting characteristic information on different scales, wherein the specific formula is as follows:
x=Conv(If4) (1)
xs=Attention(bilinear interpolation(x,size(s));s=1,2,3,4;size=[48,32,16,8](2)
Ys=Concat(bilinear interpolation(xs,64),If4) (3)
where Conv is a 1 × 1 convolution, for If4Reducing the dimension of the channel; the bilinear interpolation function refers to feature amplification by bilinear interpolationShrinking; the Concat function refers to the feature performing the splicing operation. The Attention function is specifically disclosed as follows:
for the Attention function input feature image x, the specific formula is as follows:
xquery=Conv(x);xkey=Conv(x);xvalue=Conv(x) (4)
Figure BDA0002363573340000031
xcontext=xt value×xattention(6)
xout=μ×xcontext+x (7)
wherein μ denotes a learnable coefficient and
Figure BDA0002363573340000032
xtrefers to matrix transposition.
2-2, performing dimension reduction on the Concat output result, and extracting characteristic information, wherein the specific formula is as follows:
If4_att=Conv(Ys) (8)
where Conv is a 1 × 1 convolution for YsReducing the dimension of the channel;
the multi-stage attention mechanism model (MCM) for image semantic segmentation in the step (3) specifically comprises the following steps:
firstly, a multi-level attention mechanism model for image semantic segmentation is described, and the model is specifically realized as follows:
inputting low-order characteristic image x to multi-stage attention mechanism modellAnd higher order feature image xhThe concrete formula is as follows:
3-1, carrying out unified dimension and size operation on the two input feature graphs:
xl=Conv(xl) (9)
xh=bilinear interpolation(xh,size(xl)) (10)
where the Conv function is a 1 × 1 convolution for xlPerforming channel dimensionality reduction; blinThe ear interpolation function is a bilinear interpolation pair xhPerforming size enlargement to obtain a product of xlUniform size.
3-2, carrying out splicing and normalization operation on the two characteristic images with the same dimensionality to obtain attention information:
xlh=Concat(xl,xh) (11)
xatt=Softmax(Normalize(GAP(xlh))) (12)
wherein GAP is global average pooling, and Softmax formula is as follows
Figure BDA0002363573340000033
3-3, performing Hadamardproduct operation on the attention information image and the low-order characteristic image, wherein the specific formula is as follows:
Figure BDA0002363573340000041
and 3-4, performing summation operation on the Hadamardroduct input and the high-order characteristic image, wherein the specific formula is as follows:
Fa=fa+xh(15)
then sequentially adding If4_att、If2And If1The method is input into a multi-stage attention mechanism model, and the specific formula is as follows:
IF=MCM(If4_att,If2) (16)
IF=MCM(IF,If1) (17)
wherein the MCM function refers to a multi-stage attention mechanism model.
The training model in the step (4) is as follows:
the predicted image I generated in the step (3) is processedFThe characteristic image I generated in the step (1)f3And inputting the real tag graph Gt into a defined Loss function CrossEntropyLoss to obtain a Loss value Loss, wherein the Loss value Loss is specifically disclosed as follows:
Loss=CrossEntropyLoss(IF,If3,Gt) (18)
wherein the formula of Cross EntropyLoss is as follows:
Figure BDA0002363573340000042
Figure BDA0002363573340000043
Loss=L1+λ×L2(21)
wherein B refers to the number of images input into the neural network, C refers to the number of channels of the characteristic images, and λ refers to the weight values of the two loss functions.
And adjusting parameters in the network by using a back propagation algorithm according to the calculated Loss value Loss.
The invention has the following beneficial effects:
compared with other methods, the method provided by the invention has relatively better performance in precision aiming at the problem of image semantic segmentation: firstly, the parameter quantity of the model is greatly reduced, the overfitting of the model is effectively prevented, and the training time of the model is reduced; second, it is simpler and easier to implement than other models. According to the invention, an attention mechanism is introduced into the end-to-end-based full convolution neural network, and image features are extracted at multiple scales and multiple levels, so that a better effect in an image semantic segmentation task is obtained.
Drawings
Fig. 1 is a general structural view of the present invention.
FIG. 2 is a multi-scale attention mechanism model of the present invention.
FIG. 3 is a multi-stage attention mechanism model of the present invention.
Fig. 4 is a visualization result of the model experiment of the present invention.
Detailed Description
In order to make the purpose and technical solution of the present invention more clearly understood, the following detailed description is made with reference to the accompanying drawings and examples, and the application principle of the present invention is described in detail.
As shown in fig. 1, fig. 2 and fig. 3, the present invention provides a deep neural network structure for Image semantic segmentation (Image semantic segmentation), which comprises the following specific steps:
the data preprocessing and the feature extraction of the image in the step (1) are specifically as follows:
the Pascal VOC2012 data set is used here as training and testing data.
For image data, the image features are extracted here using the existing 101-layer depth residual network (Resnet-101) model. Specifically, we uniformly scale the image data to 513 × 513 and input it into the depth residual network, and extract the output of res2c layer therein as the image feature
Figure BDA0002363573340000051
Extracting the output of res3c layer as image feature
Figure BDA0002363573340000052
Extracting the output of res4c layer as image feature
Figure BDA0002363573340000053
Extracting the output of res5c layer as image feature
Figure BDA0002363573340000054
The multi-scale attention mechanism model (MSM) in the step (2) fuses image features, and the method specifically comprises the following steps:
2-1 for If4Extraction of feature information at different scales is performed. First using convolution operation pair If4And performing dimension reduction operation to 512 channels.
2-2, carrying out bilinear interpolation operation on the dimension reduction output result to obtain characteristic images x with the dimensions of 48,32,16 and 8 respectivelys
And 2-3, performing Attention operation on the feature images with 4 scales, extracting the relevance among pixel points, and then outputting a result by sampling the Attention through bilinear interpolation. Wherein the Attention operation has the following specific formula:
for the Attention function input feature image x, the specific formula is as follows:
xquery=Conv(x);xkey=Conv(x);xvalue=Conv(x) (22)
Figure BDA0002363573340000061
xcontext=xt value×xattention(24)
xout=μ×xcontext+x (25)
wherein μ denotes a learnable coefficient and
Figure BDA0002363573340000062
xtrefers to matrix transposition.
2-4, outputting the result and I of 4 multi-scale attentionsf4Carrying out splicing and dimensionality reduction operation to obtain a characteristic image I with attention informationf4_att
The relevant operation of the multi-scale attention mechanism model is completed.
Fusing the image characteristics by the multi-stage attention mechanism model (MCM) in the step (3), which comprises the following specific steps:
3-1. for input features If4_attAnd If2Unification in dimension and scale is performed.
3-2, splicing the two characteristic images with uniform dimension, and sequentially carrying out global average pooling, regularization and normalization on the spliced output result to obtain a characteristic image x with attention informationatt
3-3. image x of attention informationattAnd a low-order feature image If2F is obtained by Hadamardroduct operationa
3-4, outputting result f for HadamardproductaAnd a high-order characteristic image If3_attDoing a summation operation to obtain Fa
3-5, mixing If1As a low-order feature image, FaPerforming the above operations 3-1 to 3-4 as high-order characteristic images to obtain final imagesOutput characteristic image IF
Thus, the multi-stage attention mechanism model operation is completed.
The training model in the step (4) is specifically as follows:
for the prediction characteristic image generated in the step (3)
Figure BDA0002363573340000063
And the characteristic image generated in the step (1)
Figure BDA0002363573340000064
An upsample operation is performed to the original size 513 × 513 and the dimensions are reduced to the number of classes of the Pascal VOC2012 data set by a convolution operation (21). Comparing the loss value with a real tag graph Gt of a data set, calculating to obtain the difference between a predicted value and an actual correct value through a defined loss function Cross EntropyLoss and forming a loss value, and then adjusting the parameter value of the whole network by using a Back-Propagation (BP) algorithm according to the loss value until the network converges.
The following table shows the accuracy of the process of the invention in Pascal VOC 2012. Our is the depth model proposed by the invention, aero, bike represents the class object to be semantically segmented in the data set, and mIoU represents the average accuracy of all classes on the semantic segmentation task.
Figure BDA0002363573340000071

Claims (4)

1. An image semantic segmentation method based on a multi-scale and multi-level attention mechanism is characterized by comprising the following steps:
given an image I, the corresponding real label map Gt, constitutes a training set:
step (1): data set preprocessing, feature extraction of image data
Preprocessing an image I: firstly, horizontally rotating the image I, randomly scaling the size, cutting the image I into uniform size, and extracting the features of the image by using a full convolution neural network to obtain the image If1、If2、If3And If4
Step (2): establishing a multi-scale attention mechanism model (MSM) and further extracting characteristics
Input image characteristics If4Zooming in different degrees is carried out on the image through bilinear interpolation, and finally, channel fusion is carried out to obtain image characteristics I with specified dimensionalityf4_att
And (3): establishing a multi-level attention mechanism model (MCM) for feature fusion
Input image characteristics If1、If2And If4_attEffectively fusing the three characteristics by using the proposed multi-stage attention mechanism model to obtain a characteristic diagram I with strong characteristic information and good robustnessF
And (4): model training
Input feature map IF、If2And (3) carrying out spatial cross entropy calculation with the real label graph Gt to obtain the difference with a real solution, and training the model parameters of the full convolution neural network defined in the step (2) and the step (3) by using a back propagation algorithm until the whole network model can be converged.
2. The image semantic segmentation method based on the multi-scale and multi-level attention mechanism according to claim 1, characterized in that the image preprocessing of step (1) and the feature fusion of the multi-scale attention mechanism model (MSM) of step (2) are as follows:
2-1, extracting the features of the image I, and extracting the image features by using the existing full convolution neural network (FCN) to form the image features If1、If2、If3And If4Which is
Figure FDA0002363573330000011
And
Figure FDA0002363573330000012
wherein
Figure FDA0002363573330000013
Where c is the number of channels of the image feature and h and w are the height and width of the image feature, respectively.
2-2 for If4Extracting characteristic information on different scales, wherein a specific formula is as follows:
x=Conv(If4) (1)
xs=Attention(bilinear interpolation(x,size(s));s=1,2,3,4;size=[48,32,16,8](2)
Ys=Concat(bilinear interpolation(xs,64),If4) (3)
where Conv is a 1 × 1 convolution, for If4Reducing the dimension of the channel; the bilinear interpolation function refers to feature scaling by bilinear interpolation; the Concat function refers to the splicing operation of the feature images. The Attention function is specifically disclosed as follows:
for the Attention function input feature image x, the specific formula is as follows:
xquery=Conv(x);xkey=Conv(x);xvalue=Conv(x) (4)
Figure FDA0002363573330000021
xcontext=xt value×xattention(6)
xout=μ×xcontext+x (7)
wherein μ denotes a learnable coefficient and
Figure FDA0002363573330000022
xtrefers to matrix transposition.
2-3, reducing the dimension of the Concat output result, and extracting characteristic information, wherein the specific formula is as follows:
If4_att=Conv(Ys) (8)
where Conv is a 1 × 1 convolution for YsAnd (5) reducing the dimension of the channel.
3. The image semantic segmentation method based on the multi-scale multi-stage attention mechanism as claimed in claim 1, wherein the multi-stage attention mechanism model (MCM) for image semantic segmentation in step (3) is specifically as follows:
firstly, the specific implementation of the multi-level attention mechanism model for image semantic segmentation is described as follows:
inputting low-order characteristic image x to multi-stage attention mechanism modellAnd higher order feature image xhThe concrete formula is as follows:
3-1, carrying out unified dimension and size operation on the two input feature graphs:
xl=Conv(xl) (9)
xh=bilinear interpolation(xh,size(xl)) (10)
where the Conv function is a 1 × 1 convolution for xlPerforming channel dimensionality reduction; the bilinear interpolation function is a bilinear interpolation pair xhPerforming size enlargement to obtain a product of xlUniform size.
3-2, carrying out splicing and normalization operation on the two characteristic images with the same dimensionality to obtain attention information:
xlh=Concat(xl,xh) (11)
xatt=Softmax(Normalize(GAP(xlh))) (12)
wherein GAP is global average pooling, and Softmax formula is as follows:
Figure FDA0002363573330000023
3-3. image x of attention informationattAnd low-order feature image xlThe Hadamardproduct operation is carried out, and the specific formula is as follows:
Figure FDA0002363573330000031
3-4, outputting the result and the high-order characteristic image x to the HadamardroducthAnd (3) performing summation operation, wherein the specific formula is as follows:
Fa=fa+xh(15)
then sequentially adding If4_att、If2And If1The input is input into a multi-stage attention mechanism model (MCM), and the specific formula is as follows:
IF=MCM(If4_att,If2) (16)
IF=MCM(IF,If1) (17)
wherein the MCM function refers to a multi-stage attention mechanism model.
4. The image semantic segmentation method based on the multi-scale and multi-level attention mechanism according to claim 1, wherein the training model in the step (4) is as follows:
the predicted image I generated in the step (3) is processedFThe characteristic image I generated in the step (1)f3And inputting the real tag graph Gt into a defined Loss function CrossEntropyLoss to obtain a Loss value Loss, wherein the Loss value Loss is specifically disclosed as follows:
Loss=CrossEntropyLoss(IF,If3,Gt) (18)
wherein the formula of Cross EntropyLoss is as follows:
Figure FDA0002363573330000032
Figure FDA0002363573330000033
Loss=L1+λ×L2(21)
wherein B refers to the number of images input into the neural network, C refers to the number of channels of the characteristic images, and λ refers to the weight values of the two loss functions.
And adjusting parameters in the network by using a back propagation algorithm according to the calculated Loss value Loss.
CN202010030667.1A 2020-01-12 2020-01-12 A Semantic Image Segmentation Method Based on Multi-scale and Multi-level Attention Mechanism Active CN111210432B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010030667.1A CN111210432B (en) 2020-01-12 2020-01-12 A Semantic Image Segmentation Method Based on Multi-scale and Multi-level Attention Mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010030667.1A CN111210432B (en) 2020-01-12 2020-01-12 A Semantic Image Segmentation Method Based on Multi-scale and Multi-level Attention Mechanism

Publications (2)

Publication Number Publication Date
CN111210432A true CN111210432A (en) 2020-05-29
CN111210432B CN111210432B (en) 2023-07-25

Family

ID=70786703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010030667.1A Active CN111210432B (en) 2020-01-12 2020-01-12 A Semantic Image Segmentation Method Based on Multi-scale and Multi-level Attention Mechanism

Country Status (1)

Country Link
CN (1) CN111210432B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667495A (en) * 2020-06-08 2020-09-15 北京环境特性研究所 Image scene analysis method and device
CN111860517A (en) * 2020-06-28 2020-10-30 广东石油化工学院 A small-sample semantic segmentation method based on distraction network
CN112233129A (en) * 2020-10-20 2021-01-15 湘潭大学 A parallel multi-scale attention mechanism semantic segmentation method and device based on deep learning
CN112465828A (en) * 2020-12-15 2021-03-09 首都师范大学 Image semantic segmentation method and device, electronic equipment and storage medium
CN113221969A (en) * 2021-04-25 2021-08-06 浙江师范大学 Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion
CN114140322A (en) * 2021-11-19 2022-03-04 华中科技大学 Attention-Guided Interpolation and Low-Latency Semantic Segmentation
CN114677380A (en) * 2022-03-25 2022-06-28 西安交通大学 A method and system for video object segmentation based on diverse interactions

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018153322A1 (en) * 2017-02-23 2018-08-30 北京市商汤科技开发有限公司 Key point detection method, neural network training method, apparatus and electronic device
CN110163878A (en) * 2019-05-28 2019-08-23 四川智盈科技有限公司 A kind of image, semantic dividing method based on dual multiple dimensioned attention mechanism
CN110188685A (en) * 2019-05-30 2019-08-30 燕山大学 A target counting method and system based on double-attention multi-scale cascade network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018153322A1 (en) * 2017-02-23 2018-08-30 北京市商汤科技开发有限公司 Key point detection method, neural network training method, apparatus and electronic device
CN110163878A (en) * 2019-05-28 2019-08-23 四川智盈科技有限公司 A kind of image, semantic dividing method based on dual multiple dimensioned attention mechanism
CN110188685A (en) * 2019-05-30 2019-08-30 燕山大学 A target counting method and system based on double-attention multi-scale cascade network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张东波;易良玲;许海霞;张莹: "多尺度局部结构主导二值模式学习图像表示" *
赵斐: "基于金字塔注意力机制的遥感图像语义分割" *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667495A (en) * 2020-06-08 2020-09-15 北京环境特性研究所 Image scene analysis method and device
CN111860517A (en) * 2020-06-28 2020-10-30 广东石油化工学院 A small-sample semantic segmentation method based on distraction network
CN111860517B (en) * 2020-06-28 2023-07-25 广东石油化工学院 Semantic segmentation method under small sample based on distraction network
CN112233129A (en) * 2020-10-20 2021-01-15 湘潭大学 A parallel multi-scale attention mechanism semantic segmentation method and device based on deep learning
CN112465828A (en) * 2020-12-15 2021-03-09 首都师范大学 Image semantic segmentation method and device, electronic equipment and storage medium
CN112465828B (en) * 2020-12-15 2024-05-31 益升益恒(北京)医学技术股份公司 Image semantic segmentation method and device, electronic equipment and storage medium
CN113221969A (en) * 2021-04-25 2021-08-06 浙江师范大学 Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion
WO2022227913A1 (en) * 2021-04-25 2022-11-03 浙江师范大学 Double-feature fusion semantic segmentation system and method based on internet of things perception
CN114140322A (en) * 2021-11-19 2022-03-04 华中科技大学 Attention-Guided Interpolation and Low-Latency Semantic Segmentation
CN114140322B (en) * 2021-11-19 2024-07-05 华中科技大学 Attention-directed interpolation method and low-latency semantic segmentation method
CN114677380A (en) * 2022-03-25 2022-06-28 西安交通大学 A method and system for video object segmentation based on diverse interactions

Also Published As

Publication number Publication date
CN111210432B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN111210432A (en) An image semantic segmentation method based on multi-scale and multi-level attention mechanism
CN111858954B (en) Task-oriented text-generated image network model
Zhang et al. Weakly supervised semantic segmentation for large-scale point cloud
CN107480206B (en) A Question Answering Method for Image Content Based on Multimodal Low-Rank Bilinear Pooling
CN116258719B (en) Method and device for flotation froth image segmentation based on multimodal data fusion
US11328172B2 (en) Method for fine-grained sketch-based scene image retrieval
JP7291183B2 (en) Methods, apparatus, devices, media, and program products for training models
CN110069656B (en) Method for searching three-dimensional model based on two-dimensional picture of generated countermeasure network
CN113254648A (en) Text emotion analysis method based on multilevel graph pooling
CN114119975B (en) Cross-modal instance segmentation method guided by language
CN112132197B (en) Model training, image processing method, device, computer equipment and storage medium
CN111723220A (en) Image retrieval method, device and storage medium based on attention mechanism and hashing
CN113486956B (en) Target segmentation system and its training method, target segmentation method and equipment
WO2021143264A1 (en) Image processing method and apparatus, server and storage medium
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN116206133A (en) RGB-D significance target detection method
CN114219824A (en) Visible light-infrared target tracking method and system based on deep network
CN110633706B (en) Semantic segmentation method based on pyramid network
CN114299305B (en) Saliency target detection algorithm for aggregating dense and attention multi-scale features
CN118643180B (en) Image retrieval method, system, device and storage medium
CN114529649A (en) Image processing method and device
CN113096176A (en) Semantic segmentation assisted binocular vision unsupervised depth estimation method
CN118113899A (en) 3D model quick retrieval method based on 2D image
CN117994623A (en) Image feature vector acquisition method
CN118447312A (en) A tri-modal few-shot object detection method based on interactive fusion and matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant