CN118608792B - Mamba-based ultra-light image segmentation method and computer device - Google Patents
Mamba-based ultra-light image segmentation method and computer device Download PDFInfo
- Publication number
- CN118608792B CN118608792B CN202411082749.5A CN202411082749A CN118608792B CN 118608792 B CN118608792 B CN 118608792B CN 202411082749 A CN202411082749 A CN 202411082749A CN 118608792 B CN118608792 B CN 118608792B
- Authority
- CN
- China
- Prior art keywords
- convolution
- layer
- module
- parallel
- image segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003709 image segmentation Methods 0.000 title claims abstract description 50
- 241000270311 Crocodylus niloticus Species 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000012360 testing method Methods 0.000 claims abstract description 8
- 238000012795 verification Methods 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000010586 diagram Methods 0.000 claims description 35
- 230000007246 mechanism Effects 0.000 claims description 19
- 238000010606 normalization Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 13
- 238000011176 pooling Methods 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 12
- 230000000007 visual effect Effects 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 abstract description 6
- 238000001514 detection method Methods 0.000 abstract description 4
- 230000035945 sensitivity Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000002202 sandwich sublimation Methods 0.000 description 3
- 230000002860 competitive effect Effects 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention relates to the field of image segmentation, in particular to an ultra-light image segmentation method and a computer device based on Mamba, which can maximize optimal computing resources while maintaining excellent segmentation performance, obtain an ultra-light model and are more suitable for mobile detection equipment. The technical proposal comprises: the method comprises the steps of obtaining an original image, preprocessing the original image to obtain an original image set, and dividing the original image set into a training set, a verification set and a test set according to a set proportion; constructing a Mamba-based ultra-light image segmentation model; taking the original images in the training set and the verification set as the input of the Mamba-based ultra-light image segmentation model, and performing image segmentation training on the Mamba-based ultra-light image segmentation model; and inputting the original image in the test set into a trained ultra-light image segmentation model based on Mamba to obtain an image segmentation result. The invention is suitable for image segmentation.
Description
Technical Field
The invention relates to the field of image segmentation, in particular to an ultra-light image segmentation method based on Mamba and a computer device.
Background
Conventional image segmentation is typically implemented using deep learning networks represented by convolution and transform architectures. Convolution has excellent local feature extraction capability, but has shortcomings in establishing relevance of the remote information. The self-attention mechanism, while solving the problem of telematics extraction with a continuous patch sequence, also brings about a significant computational load. In order to improve the segmentation performance of the model, most methods tend to use modules that add more complexity. However, this is not suitable for the actual application scenario, especially the mobile detection device, and the model of the computational load is not suitable for the actual application scenario due to the limitation of the computational resources.
In recent years, a State Space Model (SSMs) represented by Mamba has become a powerful competitor to traditional convolutional neural networks and transducer architectures. The State Space Model (SSMs) shows linear complexity in terms of input size and memory footprint, which makes them critical to the basis of the lightweight model. Furthermore SSMs is adept at capturing remote dependencies, which can critically address the convolution problem for extracting information over long distances. In industrial inspection, practical computing power and memory constraints are often considered.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides an ultra-light image segmentation method and a computer device based on Mamba, which can maintain excellent segmentation performance, maximize optimized computing resources, obtain an ultra-light model and are more suitable for mobile detection equipment.
The present invention adopts the following technical scheme to achieve the above object, and in a first aspect, the present invention provides an ultra-light image segmentation method based on Mamba, including:
s1, acquiring an original image, preprocessing the original image to obtain an original image set, and dividing the original image set into a training set, a verification set and a test set according to a set proportion;
s2, constructing an ultra-light image segmentation model based on Mamba;
The Mamba-based ultra-light image segmentation model mainly comprises an encoder, a decoder and jump connection between the encoder and the decoder;
The encoder comprises a first residual convolution module, a second residual convolution module, a first parallel convolution module and a second parallel convolution module, the decoder comprises a convolution module, a first parallel vision module, a second parallel vision module and a third parallel vision module, the jump connection carries out multi-level and multi-scale information fusion through an attention mechanism module, and the attention mechanism module mainly comprises a space attention machine submodule and a channel attention mechanism submodule;
The first residual convolution module and the second residual convolution module have the same structure and mainly consist of three parallel convolution layers, the first parallel convolution module and the second parallel convolution module have the same structure and mainly consist of four parallel layers, each parallel layer mainly consists of three branches connected by residual errors, the first branch consists of a visual state space block and jump connection, the second branch consists of standard convolution and jump connection with a convolution kernel of 3, and the third branch consists of standard convolution and jump connection with a convolution kernel of 5;
the first parallel vision module, the second parallel vision module and the third parallel vision module have the same structure and mainly consist of four parallel layers, each parallel layer is mainly composed of a vision state space block and jump connection, the vision state space block is mainly composed of two branches, the first branch is mainly composed of a linear layer and SiLU activation functions, the second branch is mainly composed of a linear layer, depth convolution, siLU activation functions, a state space model and a layer normalization layer, and finally the two branches are combined for output through element-by-element multiplication;
S3, taking the original images in the training set and the verification set as input of an ultra-light image segmentation model based on Mamba, and performing image segmentation training on the ultra-light image segmentation model based on Mamba;
S4, inputting the original image in the test set into a trained ultra-light image segmentation model based on Mamba to obtain an image segmentation result.
Further, S3 specifically includes:
Encoder training process: respectively inputting an original image into a 3X 3 convolution layer and a 5X 5 convolution layer of a first residual convolution module to obtain two corresponding branch results, merging the two branch results to obtain a first feature map, then inputting the original image into a 1X 1 convolution layer of the first residual convolution module to obtain a feature map, fusing the feature map with the first feature map, outputting a second feature map after fusing, inputting the second feature map into a second residual convolution module, and outputting a third feature map by the second residual convolution module;
Inputting a third characteristic diagram with the channel number of C into a layer normalization layer of a first parallel convolution module, dividing the third characteristic diagram into four corresponding characteristic diagrams with the channel number of C/4, respectively inputting each corresponding characteristic diagram into each parallel layer, splicing and adjusting factors for the outputs of three residual connected branches in the parallel layers to obtain three corresponding characteristic diagrams, adding corresponding elements of the three corresponding characteristic diagrams of each branch to obtain 4 middle characteristic diagrams with the channel number of C/4, combining the four middle characteristic diagrams with the channel number of C/4 into a fourth characteristic diagram with the channel number of C through splicing operation, and finally outputting a fifth characteristic diagram through operation of the layer normalization layer and a projection operation layer.
Further, S3 specifically further includes:
Decoder training process: inputting a characteristic map with the channel number of C output by an encoder into a layer normalization layer of a third parallel vision module, dividing the characteristic map into four corresponding characteristic maps with the channel number of C/4, respectively inputting each characteristic into VSS Block, then carrying out residual error splicing and adjustment factors, obtaining the characteristic map with the channel number of C through characteristic map splicing, respectively outputting the corresponding characteristic map through the layer normalization layer and a projection operation layer operation, inputting the corresponding characteristic map into a convolution module with the convolution kernel of 1, and outputting a segmented image.
Further, the jump connection performs multi-level and multi-scale information fusion through an attention mechanism specifically includes:
Firstly, inputting a feature map into a space attention machine submodule, respectively carrying out maximum pooling and average pooling treatment, then splicing two pooled results along a channel dimension, carrying out convolution operation by using a convolution layer, limiting an output result within a range of [0, 1] by using a Sigmoid activation function, multiplying the input feature map by the result, and adding the result with the input feature map to obtain a space attention feature map;
The method comprises the steps of taking a spatial attention feature map as input of a channel attention machine sub-module, carrying out global average pooling on the input feature map through the channel attention machine sub-module, compressing the spatial dimension of the input feature map through self-adaptive pooling, reserving channel information, calculating global attention weights through one-dimensional convolution, calculating the attention weight of each channel through a full-connection layer or a convolution layer, limiting the attention weight within the range of [0, 1] through a Sigmoid activation function, applying the calculated attention weights to the corresponding input feature map, and returning to a final attention feature map after splicing the input feature map.
In a second aspect, the present invention provides a computer apparatus comprising a memory storing program instructions that, when executed, perform the Mamba-based ultra-lightweight image segmentation method described above.
The beneficial effects of the invention are as follows:
The invention introduces a visual state space block as a basic block to capture extensive context information, combines excellent local feature extraction capability of convolution, provides a parallel processing method for dividing channels and carrying out local convolution, greatly reduces parameter quantity and operation quantity, and constructs an asymmetric encoder-decoder structure. The ultra-lightweight model is obtained while the excellent segmentation performance is maintained, so that the computational resource is optimized to the maximum extent, and the method is more suitable for mobile detection equipment.
Drawings
FIG. 1 is a flowchart of an ultra-light image segmentation method based on Mamba according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an ultra-light image segmentation model based on Mamba according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a spatial attention mechanism submodule according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a channel attention mechanism submodule according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a first parallel convolution module according to an embodiment of the present disclosure;
Fig. 6 is a schematic structural diagram of each parallel layer in an MPL (Multiple Parallel Vision Layer, multiple parallel visual layer) module, i.e., a first parallel convolution module, provided by an embodiment of the present invention;
Fig. 7 is a schematic structural diagram of each parallel layer in a PVL (Parallel Vision Layer, parallel visual layer) module, i.e., a first parallel visual module, provided by an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a visual state space block according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a Mamba-based ultra-light image segmentation method, as shown in fig. 1, comprising the following steps:
s1, acquiring an original image, preprocessing the original image to obtain an original image set, and dividing the original image into a training set, a verification set and a test set according to a certain proportion.
S2, constructing an ultra-light image segmentation model based on Mamba.
As shown in fig. 2, the ultra-light image segmentation model based on Mamba has a four-layer structure and mainly comprises an encoder, a decoder and jump connection among the encoder and the decoder;
The encoder comprises a first residual convolution module, a second residual convolution module, a first parallel convolution module and a second parallel convolution module, the decoder comprises a convolution module, a first parallel vision module, a second parallel vision module and a third parallel vision module, the jump connection carries out multi-level and multi-scale information fusion through an attention mechanism module, the attention mechanism module mainly comprises a space attention mechanism submodule and a channel attention mechanism submodule, and the structures of the space attention mechanism submodule and the channel attention mechanism submodule are respectively shown in fig. 3 and 4;
the first residual convolution module and the second residual convolution module have the same structure and mainly consist of three parallel convolution layers, the first parallel convolution module and the second parallel convolution module have the same structure and mainly consist of four parallel layers as shown in fig. 5, each parallel layer mainly consists of three branches connected by residual, as shown in fig. 6, the first branch consists of a visual state space block and jump connection, scale (a scalar value) is introduced for controlling the scaling of the jump connection, so that the gradient vanishing problem is reduced, and the training is accelerated. The second branch consists of a standard convolution and jump connection with a convolution kernel of 3 and the third branch consists of a standard convolution and jump connection with a convolution kernel of 5, while scales are introduced for controlling the scaling of the jump connection.
The first parallel vision module, the second parallel vision module and the third parallel vision module are similar to the first parallel convolution module in structure, and are mainly composed of four parallel layers as shown in fig. 5, and unlike the first parallel convolution module, each parallel layer of the first parallel vision module is mainly composed of a branch connected by a residual error, two parallel convolution layers are absent, and only the vision state space module is included, so that high-level characteristics can be maintained while resolution of a feature map is recovered. As shown in fig. 7, each parallel layer of the first parallel vision module is mainly composed of a vision state space block and a jump connection, as shown in fig. 8, the vision state space block is mainly composed of two branches, the first branch is mainly composed of a linear layer and SiLU activation functions, the second branch is mainly composed of a linear layer, a depth convolution, siLU activation functions, a state space model and a layer normalization layer, and finally the two branches are combined for output through element-by-element multiplication.
S3, taking the original images in the training set and the verification set as input of an ultra-light image segmentation model based on Mamba, and performing image segmentation training on the ultra-light image segmentation model based on Mamba.
S4, inputting the original image in the test set into a trained ultra-light image segmentation model based on Mamba to obtain an image segmentation result.
In one embodiment of the present invention, S3 specifically includes:
Encoder training process: to the original image Respectively inputting the 3×3 convolution layer and the 5×5 convolution layer of the first residual convolution module to obtain two corresponding branch results, and combining the two branch results to obtain a first feature mapThen the original image isThe 1X 1 convolution layer input into the first residual convolution module obtains a feature map and a first feature mapFusing and outputting a second characteristic diagramSecond characteristic diagramInputting a second residual convolution module, and outputting a third characteristic diagram by the second residual convolution moduleThe specific operation can be represented by the following formula:
;
cat represents a join operation, add represents an element overlay operation.
As shown in FIG. 5, a third characteristic diagram with the number of channels being CFirst, a layer normalization layer of a first parallel convolution module is input and then divided into four characteristic diagrams with the channel number of C/4Each feature map is then input into a parallel layer of convolutions of 3 x 3, 5 x 5, and the outputs of the three residual connected branches in the parallel layer are spliced and scaledThree corresponding feature maps are obtainedCorresponding element addition is carried out on the three feature graphs of each branch to obtain 4 feature graphs with the channel number of C/4Combining the four feature images with the channel number of C/4 into a fourth feature image with the channel number of C through splicing operationFinally, outputting a fifth characteristic diagram through the operation of the layer normalization layer and the projection operation layer。
The specific operation can be represented by the following formula:
;
。
Chunk 4 denotes that the input feature map is divided into four parts along the channel dimension; layerNorm denotes layer normalization; project represents a Projection operation; reshape denotes changing the shape of the multi-dimensional array.
Will fifth characteristic diagramAnd inputting the second parallel convolution module, and obtaining a sixth characteristic diagram F 6 through the similar operation.
Decoder training process: the sixth feature map F 6 with the channel number of C is firstly input into a layer normalization layer of a third parallel vision module, then divided into four feature maps with the channel number of C/4, then each feature map is respectively input into a vision state space block, then residual splicing and adjustment factors are carried out, the feature maps with the channel number of C are obtained through feature map splicing, a seventh feature map F 7 is respectively output through the layer normalization layer and a projection operation layer operation, then the seventh feature map F 7 is input into a second parallel vision module to output an eighth feature map F 8, the eighth feature map F 8 is input into a first parallel vision module to output a ninth feature map F 9, finally the ninth feature map F 9 is input into a convolution module with a convolution kernel of 1, and the divided images are output.
In one embodiment of the present invention, the jump connection performs multi-level and multi-scale information fusion through an attention mechanism specifically includes:
As shown in fig. 3, the feature map is input into the spatial attention machine submodule, the maximum pooling and the average pooling are respectively carried out, then the two pooled results are spliced along the channel dimension, then the one-dimensional convolution layer is used for convolution operation, then the output is limited in the range of [0, 1] through the full connection layer and the Sigmoid activation function, and finally the input feature map is multiplied by the result and added with the input feature map to obtain the spatial attention feature map.
As shown in fig. 4, the output of the spatial attention machine sub-module is used as the input of the channel attention machine sub-module, the channel attention machine sub-module firstly performs global average pooling on the input feature map, compresses the spatial dimension of the input feature map through adaptive pooling, retains channel information, then splices with the feature maps of the rest layers, calculates global attention weight by using one-dimensional convolution, calculates the attention weight of each channel by using a full-connection layer or convolution layer, limits the attention weight within the range of [0, 1] by using a Sigmoid activation function, applies the calculated attention weight to the corresponding input feature map, adds the input feature map and returns to the final attention feature map, and the specific operation can be represented by the following formula:
;
Wherein GAP is a global average pooling of, For the feature maps of the different stages obtained from the encoder Concat represents a join operation in the channel dimension, conv1D represents a one-dimensional convolution operation, FCi is the fully joined layer of stage i, σ is the sigmoid function, and Σ is the element multiplication.
In order to verify the competitive performance of the Mamba-based ultra-lightweight image segmentation model under the light weight, the invention carries out a comparison experiment on the Mamba-based lightweight model and the classical medical image segmentation model. Specifically, the comparison objects include U-Net, VM-UNet, MAUNet, and UltraLight VM-UNet.
Common indicators of model performance assessed with a segmentation dataset commonly used in medicine include Dice Similarity Coefficient (DSC), sensitivity (SE), specificity (SPECIFICITY, SP) and Accuracy (ACC). The specific data are shown in Table 1.
Table 1 parameter evaluation table
As shown in Table 1, the model parameters of the present invention were 99.94% lower than the conventional pure vision Mannich model (VM-UNet), 75.51% lower than the current lightest vision Manbablet model (UltraLight VM-UNet), 99.84% lower than the conventional U-Net model, and 93.14% lower than the MALUNet model. The GFLOPs of the model is 97.28% lower than VM-UNet, while GFLOPs of the model is slightly raised compared with UltraLight VM-UNet and MALUNet, but the calculated amount is only raised by 32.58% compared with UltraLight VM-UNet model, but the parameter is reduced by 75.51%, so that the model is still superior to the current lightest vision Manbablet model in the overall view. The other parameters, namely DSC is an accuracy index for evaluating the segmentation result, and the model is superior to all the models; SENSITIVITY (SE) is the ability of the model to correctly identify positive samples, SPECIFICITY (SP) is the ability of the model to correctly identify negative samples, and is a pair of opposite parameters; the Accuracy measures the overall correct classification of the sample for the model, and it can be seen from table 1 that both are superior to the model described above. With such large decreases in parameters and GFLOPs, the performance of the model of the present invention remains excellent and highly competitive.
The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.
Claims (7)
1. A Mamba-based ultra-lightweight image segmentation method, comprising:
s1, acquiring an original image, preprocessing the original image to obtain an original image set, and dividing the original image into a training set, a verification set and a test set according to a set proportion;
s2, constructing an ultra-light image segmentation model based on Mamba;
The Mamba-based ultra-light image segmentation model mainly comprises an encoder, a decoder and jump connection between the encoder and the decoder;
The encoder comprises a first residual convolution module, a second residual convolution module, a first parallel convolution module and a second parallel convolution module, the decoder comprises a convolution module, a first parallel vision module, a second parallel vision module and a third parallel vision module, the jump connection carries out multi-level and multi-scale information fusion through an attention mechanism module, and the attention mechanism module mainly comprises a space attention machine submodule and a channel attention mechanism submodule;
The first residual convolution module and the second residual convolution module have the same structure and mainly consist of three parallel convolution layers, the first parallel convolution module and the second parallel convolution module have the same structure and mainly consist of four parallel layers, each parallel layer mainly consists of three branches connected by residual errors, the first branch consists of a visual state space block and jump connection, the second branch consists of standard convolution and jump connection with a convolution kernel of 3, and the third branch consists of standard convolution and jump connection with a convolution kernel of 5;
the first parallel vision module, the second parallel vision module and the third parallel vision module have the same structure and mainly consist of four parallel layers, each parallel layer is mainly composed of a vision state space block and jump connection, the vision state space block is mainly composed of two branches, the first branch is mainly composed of a linear layer and SiLU activation functions, the second branch is mainly composed of a linear layer, depth convolution, siLU activation functions, a state space model and a layer normalization layer, and finally the two branches are combined for output through element-by-element multiplication;
S3, taking the original images in the training set and the verification set as input of an ultra-light image segmentation model based on Mamba, and performing image segmentation training on the ultra-light image segmentation model based on Mamba;
S4, inputting the original image in the test set into a trained ultra-light image segmentation model based on Mamba to obtain an image segmentation result.
2. The Mamba-based ultra-lightweight image segmentation method as set forth in claim 1, wherein S3 specifically includes:
encoder training process: the method comprises the steps of inputting an original image into a 3X 3 convolution layer and a 5X 5 convolution layer of a first residual convolution module respectively to obtain two corresponding branch results, merging the two branch results to obtain a first feature map, then fusing the feature map obtained by inputting the original image into the 1X 1 convolution layer of the first residual convolution module with the first feature map, outputting a second feature map after fusing, inputting the second feature map into a second residual convolution module, and outputting a third feature map by the second residual convolution module.
3. The Mamba-based ultra-lightweight image segmentation method as set forth in claim 2, wherein the encoder training process further includes:
Inputting a third characteristic diagram with the channel number of C into a layer normalization layer of a first parallel convolution module, dividing the third characteristic diagram into four corresponding characteristic diagrams with the channel number of C/4, respectively inputting each corresponding characteristic diagram into each parallel layer, splicing and adjusting factors for the outputs of three residual connected branches in the parallel layers to obtain three corresponding characteristic diagrams, adding corresponding elements of the three corresponding characteristic diagrams of each branch to obtain 4 middle characteristic diagrams with the channel number of C/4, combining the four middle characteristic diagrams with the channel number of C/4 into a fourth characteristic diagram with the channel number of C through splicing operation, and finally outputting a fifth characteristic diagram through operation of the layer normalization layer and a projection operation layer.
4. The Mamba-based ultra-lightweight image segmentation method as set forth in claim 1, wherein S3 specifically further includes:
Decoder training process: inputting a characteristic map with the channel number of C output by an encoder into a layer normalization layer of a third parallel vision module, dividing the characteristic map into four corresponding characteristic maps with the channel number of C/4, respectively inputting each characteristic into VSS Block, then carrying out residual error splicing and adjustment factors, obtaining the characteristic map with the channel number of C through characteristic map splicing, respectively outputting the corresponding characteristic map through the layer normalization layer and a projection operation layer operation, inputting the corresponding characteristic map into a convolution module with the convolution kernel of 1, and outputting a segmented image.
5. The method for ultra-lightweight image segmentation based on Mamba as set forth in claim 1, wherein the step of performing multi-level, multi-scale information fusion by using a attention mechanism includes:
Firstly, inputting a feature map into a space attention machine submodule, respectively carrying out maximum pooling and average pooling treatment, then splicing two pooled results along the channel dimension, then carrying out convolution operation by using a convolution layer, limiting an output result within the range of [0, 1] by using a Sigmoid activation function, multiplying the input feature map by the result, and then adding the result with the input feature map to obtain the space attention feature map.
6. The method for ultra-lightweight image segmentation based on Mamba as set forth in claim 5, wherein the step of performing multi-level, multi-scale information fusion by using a attention mechanism by using a skip connection further comprises:
The method comprises the steps of taking a spatial attention feature map as input of a channel attention machine sub-module, carrying out global average pooling on the input feature map through the channel attention machine sub-module, compressing the spatial dimension of the input feature map through self-adaptive pooling, reserving channel information, calculating global attention weights through one-dimensional convolution, calculating the attention weight of each channel through a full-connection layer or a convolution layer, limiting the attention weight within the range of [0, 1] through a Sigmoid activation function, applying the calculated attention weights to the corresponding input feature map, and returning to a final attention feature map after splicing the input feature map.
7. A computer apparatus comprising a memory storing program instructions that, when executed, perform the Mamba-based ultra-lightweight image segmentation method as set forth in any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411082749.5A CN118608792B (en) | 2024-08-08 | 2024-08-08 | Mamba-based ultra-light image segmentation method and computer device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411082749.5A CN118608792B (en) | 2024-08-08 | 2024-08-08 | Mamba-based ultra-light image segmentation method and computer device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118608792A CN118608792A (en) | 2024-09-06 |
CN118608792B true CN118608792B (en) | 2024-10-01 |
Family
ID=92557473
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411082749.5A Active CN118608792B (en) | 2024-08-08 | 2024-08-08 | Mamba-based ultra-light image segmentation method and computer device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118608792B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116416432A (en) * | 2023-04-12 | 2023-07-11 | 西南石油大学 | Pipeline weld image segmentation method based on improved UNet |
CN116580192A (en) * | 2023-04-18 | 2023-08-11 | 湖北工业大学 | RGB-D semantic segmentation method and system based on self-adaptive context awareness network |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112465830B (en) * | 2020-11-11 | 2024-04-26 | 上海健康医学院 | Automatic segmentation method for polished glass-like lung nodule and computer equipment |
US20240062347A1 (en) * | 2022-08-22 | 2024-02-22 | Nanjing University Of Posts And Telecommunications | Multi-scale fusion defogging method based on stacked hourglass network |
CN118279319A (en) * | 2024-03-19 | 2024-07-02 | 福建理工大学 | Medical image segmentation method based on global attention and multi-scale features |
-
2024
- 2024-08-08 CN CN202411082749.5A patent/CN118608792B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116416432A (en) * | 2023-04-12 | 2023-07-11 | 西南石油大学 | Pipeline weld image segmentation method based on improved UNet |
CN116580192A (en) * | 2023-04-18 | 2023-08-11 | 湖北工业大学 | RGB-D semantic segmentation method and system based on self-adaptive context awareness network |
Also Published As
Publication number | Publication date |
---|---|
CN118608792A (en) | 2024-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111259940B (en) | Target detection method based on space attention map | |
Liu et al. | Connecting image denoising and high-level vision tasks via deep learning | |
CN111027576B (en) | Cooperative significance detection method based on cooperative significance generation type countermeasure network | |
CN112381763A (en) | Surface defect detection method | |
CN110245683B (en) | Residual error relation network construction method for less-sample target identification and application | |
CN114529982A (en) | Lightweight human body posture estimation method and system based on stream attention | |
CN114821328A (en) | Electric power image processing method and device based on complete learning | |
CN112149526B (en) | Lane line detection method and system based on long-distance information fusion | |
CN116758407A (en) | Underwater small target detection method and device based on CenterNet | |
CN110210314B (en) | Face detection method, device, computer equipment and storage medium | |
CN112215301B (en) | Image straight line detection method based on convolutional neural network | |
CN114492755A (en) | Target detection model compression method based on knowledge distillation | |
CN118608792B (en) | Mamba-based ultra-light image segmentation method and computer device | |
CN110705695B (en) | Method, device, equipment and storage medium for searching model structure | |
CN116523888B (en) | Pavement crack detection method, device, equipment and medium | |
CN118037655A (en) | SSD-based photovoltaic panel flaw detection method and device | |
CN117884379A (en) | Ore sorting method and system | |
CN117152823A (en) | Multi-task age estimation method based on dynamic cavity convolution pyramid attention | |
US20230401670A1 (en) | Multi-scale autoencoder generation method, electronic device and readable storage medium | |
US12112524B2 (en) | Image augmentation method, electronic device and readable storage medium | |
CN116258871A (en) | Fusion feature-based target network model acquisition method and device | |
CN116109868A (en) | Image classification model construction and small sample image classification method based on lightweight neural network | |
CN114818945A (en) | Small sample image classification method and device integrating category adaptive metric learning | |
CN113962332A (en) | Salient target identification method based on self-optimization fusion feedback | |
CN116030347B (en) | High-resolution remote sensing image building extraction method based on attention network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |