CN117132500A - Weak light enhancement method based on sparse conversion network - Google Patents
Weak light enhancement method based on sparse conversion network Download PDFInfo
- Publication number
- CN117132500A CN117132500A CN202311178208.8A CN202311178208A CN117132500A CN 117132500 A CN117132500 A CN 117132500A CN 202311178208 A CN202311178208 A CN 202311178208A CN 117132500 A CN117132500 A CN 117132500A
- Authority
- CN
- China
- Prior art keywords
- feature
- module
- network
- stb
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 12
- 230000006870 function Effects 0.000 claims abstract description 53
- 238000005286 illumination Methods 0.000 claims description 46
- 238000010586 diagram Methods 0.000 claims description 44
- 238000012549 training Methods 0.000 claims description 35
- 230000009466 transformation Effects 0.000 claims description 26
- UTTZHZDGHMJDPM-NXCSSKFKSA-N 7-[2-[[(1r,2s)-1-hydroxy-1-phenylpropan-2-yl]amino]ethyl]-1,3-dimethylpurine-2,6-dione;hydrochloride Chemical compound Cl.C1([C@@H](O)[C@@H](NCCN2C=3C(=O)N(C)C(=O)N(C)C=3N=C2)C)=CC=CC=C1 UTTZHZDGHMJDPM-NXCSSKFKSA-N 0.000 claims description 24
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000010606 normalization Methods 0.000 claims description 16
- 230000000295 complement effect Effects 0.000 claims description 14
- 230000002776 aggregation Effects 0.000 claims description 12
- 238000004220 aggregation Methods 0.000 claims description 12
- 238000000926 separation method Methods 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 239000013589 supplement Substances 0.000 claims description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 238000000137 annealing Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 238000007499 fusion processing Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 239000000047 product Substances 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims description 3
- 230000000873 masking effect Effects 0.000 claims 1
- 230000009467 reduction Effects 0.000 claims 1
- 230000000153 supplemental effect Effects 0.000 abstract 1
- 238000013459 approach Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/771—Feature selection, e.g. selecting representative features from a multi-dimensional feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a weak light enhancement method based on a sparse conversion network, which is characterized in that a feature selection block STB is used for carrying out feature selection, a space reconstruction unit SRU and a channel reconstruction unit CRU are added to reduce redundancy in a model, an NRetinex module is used for removing improper features in an image, effective mixed-scale feed-forward network MSFN supplemental information is added, and two mutually-complementary loss functions are added to restrict the network to achieve an optimal weak light image enhancement result. The invention solves the problem of weak light enhancement method based on sparse conversion network in the prior art.
Description
Technical Field
The invention belongs to the technical field of computer digital image processing, and particularly relates to a weak light enhancement method based on a sparse transform network.
Background
In low light environments, when the camera captures insufficient light, the image is always subject to various distortions such as low contrast, poor visibility and sensor noise, the result of which is unsatisfactory, which can affect subsequent human visualization and computer vision tasks. In order to correct contrast, reveal texture, eliminate sensor noise, and efforts have been made to develop low-light image enhancement algorithms over the last decades.
In recent years, low-light image enhancement algorithms based on learning have achieved great success, but conventional methods generally rely on manually defined prior assumptions which are not adaptable to various complex low-light scenes and have limited effectiveness in low-light image enhancement for different scenes. Moreover, many learning-based approaches have employed various Convolutional Neural Network (CNN) architectures as a preferable choice over traditional algorithms, however, model acquisition of non-local information is limited due to the inherent nature of the convolutional operation, i.e., the independence of local receptive fields and input content.
Disclosure of Invention
The invention aims to provide a weak light enhancement method based on a sparse conversion network, which solves the problem of the weak light enhancement method based on the sparse conversion network in the prior art.
The technical scheme adopted by the invention is that the weak light enhancement method based on a sparse conversion network carries out feature selection through a feature selection block STB, adds a space reconstruction unit SRU and a channel reconstruction unit CRU to reduce redundancy in a model, removes improper features in an image by using an NRetinex module, adds effective mixed scale feed-forward network MSFN supplementary information, and adds two mutually complementary loss functions to restrict the network to achieve the optimal weak light image enhancement result.
The present invention is also characterized in that,
the method is implemented according to the following steps:
step 1, designing a feature extraction module STB, wherein the STB module comprises a normalization layer LN, an attention selection module TKSA and a module NRetinex for removing improper features, and extracting rich features of a weak light image with a complex environment by stacking STBs with different spatial resolutions and channel dimensions;
step 2, designing an attention selection module TKSA-SC, wherein the TKSA-SC module further comprises a Top-k selection module and a mixed scale feed-forward network MSFN module, so as to adaptively reserve the most useful attention value and reduce redundancy in the feature mapping, thereby realizing better feature aggregation;
and 3, designing and removing an inappropriate feature module NRetinex, wherein the inappropriate feature module NRetinex comprises three networks of N-Net, L-Net and R-Net. Removing inappropriate features after feature aggregation by using N-Net, then decomposing input into a reflection diagram and an illumination diagram by using L-Net and R-Net, then adjusting the illumination diagram by using a module NRetiinex, and recombining the illumination diagram after the adjustment of the NRetiinex module with the reflection diagram to obtain an enhanced image;
step 4, the loss function comprises two: l (L) MSE And L SSIM ,L MSE Is a mean square error loss function, which is the mean of the sum of squares of the prediction-generated image and the original input image at the corresponding point error, L SSIM The method is an effective measurement of the structural similarity between the self-generated image with enhanced weak light and the groudtluth image, and finally a network with enhanced optimal weak light image is obtained;
step 5, training an effective sparse transducer network built by a weak light enhancement method based on a sparse transformation network by using a training set in a public reference data set LOL, training 150000 epochs, verifying a training result and storing a neural network model; the effective sparse transform network built by the weak light enhancement method based on the sparse transform network comprises an STB block, a TKSA-SC block and an NRetinex block;
and 6, loading the network model trained in the step 5, inputting the test set in the public reference data set LOL into the network model trained in the step 5, and then storing the test result to obtain the image after weak light enhancement.
The step 1 is specifically implemented according to the following steps:
step 1.1, firstly, giving a weak light imageAnd overlapping image blocks are embedded using a 3 x 3 convolution;
step 1.2, the image block embedded in the step 1.1 is sent to an expert mixed characteristic compensator MEFC module, and N is used together 0 The individual MEFC modules provide complementary feature refinements;
Step 1.3, the STB block includes two parts of STB-I and STB-II, the STB-I stacks three STB blocks for encoding and the STB-II stacks three STB blocks for decoding. Inputting the features refined by the MEFC module in the step 1.2 into the STB-I, wherein the STB-I stacks N i A plurality of STBs code, N i ∈[1,2,3]The features coded by STB-I are input into a single STB block, then the feature results output by the single STB block are input into STB-II, which is also stacked with N i ∈[1,2,3]Decoding by the STB, and finally outputting a characteristic result after STB-II block decoding;
formally, the input feature X of the (l-1) th block is given l-1 The encoding process of STB is shown in the formula (1) (2):
X l ′=X l-1 +TKSA-SC(LN(X l-1 )) (1)
X l =X l ′+NRetinex(LN(X l ′)) (2)
wherein l represents the number of STB blocks, X l Represents the output of NRetinex, X l ' represents the output of TKSA, X l-1 Representing the input features of the (l-1) th block, "LN" represents layer normalization, "TKSA-SC" represents TKSA-SC module, "NRetinex" represents NRetinex module;
step 1.4, inputting the result output after the step 1.3 passes through STB-II to the expert hybrid feature compensator MEFC module again, outputting the final feature result through 3X 3 convolution, and inputting the original weak light image I low The feature results added here supplement the image details to form the final output dim-light enhanced image.
The step 2 is specifically implemented according to the following steps:
step 2.1, attention selection Module TKSA-SC as part of the STB Module in step 1, given query Q, key K and value V, the dimensions of query Q, key K and value V are allThe output of dot product attention is generally expressed as:
wherein Att (Q, K, V) represents the mechanism of attention, Q, K and V represent the query Q, key K and value V, respectively, in matrix form, lambda being an optional temperature factor defined asPerforming multi-head attention on each new Q, K and V to obtain channel dimension output, connecting the outputs, and obtaining the final result of all heads through linear projection;
in the TKSA module, the TKSA first applies a 1×1 convolution and a 3×3 depth convolution Dw-Conv to perform context coding on the input of the entire STB module in the channel dimension;
step 2.2, the query Q and the key K divided by the convolution of step 2.1 are passed through a spatial reconstruction unit SRU, which comprises the operations of separation and reconstruction, in particular as follows:
the SRU unit comprises a grouping normalization layer, the SRU firstly utilizes the scaling factors in the grouping normalization layer to evaluate the information richness of different feature graphs, and the query Q after the step 2.1 is evaluated to obtain informativeness weight W 1 The key K after the step 2.1 is evaluated to obtain the non-informative weight W 2 ,
Then, the query Q generated in the step 2.1 is respectively combined with the informative weight W 1 Multiplying to obtain a weighted feature: letter (letter)Sex characteristics X 1w Query Q and non-informative weight W 2 Multiplying to obtain a weighted feature: less informative feature X 2w Thus, query Q is successfully split into two parts: x is X 1w With rich information and strong expressive power, X 2w Almost no information, considered as redundant parts, key K also repeats the above step of querying Q, so that the split operation in the entire SRU is ended;
the reconstruction operation adopts cross reconstruction operation, fully combines the weighting characteristics of two different information, enhances the information flow between the two information, and leads X to be 1w Divided into X 11w And X 12w Likewise, X is 2w Divided into X 21w And X 22w Then X is taken up 11w And X is 22w Adding to obtain X w1 Likewise X is 12w And X 21w Adding to obtain X w2 ;
Then, the cross-reconstructed feature X W1 And X W2 Splicing to obtain a final spatially refined feature map X w ;
Step 2.3, the value V divided in step 2.1 is passed through a channel reconstruction unit CRU, which uses a strategy of separation-transformation-fusion, the separation operation firstly divides the channel of the input feature, i.e. the value V part, into two parts, an αc channel and a (1- α) C channel, respectively, and then uses the channel of the 1×1 convolution compression feature map to increase the calculation efficiency, and the value V divided in step 2.1 is divided into an upper layer feature X through the division and compression operation up And underlying feature X low ;
In successful separation of the value V divided in step 2.1 into the upper layer features X up And underlying feature X low Later, the transformation operation is to transform the upper layer characteristic X up Feeding into the upper layer transformation stage as a "rich feature extractor", at X up A kxk group level convolution GWC operation and a 1 x 1 point convolution PWC operation are performed thereon. Subsequently, the two partial results of the group level convolution GWC operation and the point convolution PWC operation are added to form a combined representative feature map Y 1 The upper layer transformation stage can be expressed as:
wherein,is a learnable weight matrix of group level convolution GWC,/for>Is a learnable weight matrix of point convolution PWC, i.e. the upper layer transformation stage uses the same characteristic diagram X up The combination of group level convolution GWC and dot convolution PWC is performed to finally extract representative feature Y 1 Simultaneously, the calculation cost is reduced;
X low entering a lower layer transformation stage, generating a characteristic diagram with shallow hidden details by using only 1X 1 point convolution PWC in the lower layer stage, supplementing upper layer characteristics, and finally carrying out point convolution PWC operation on the characteristic diagram and X entering the lower layer transformation stage low Connected to form the output Y of the lower stage 2 The formula is as follows:
here, theIs a learnable weight matrix of the point convolution PWC, and U is a connection operation;
the fusion process first applies pooling to the output features to collect global spatial information S with channel statistics m Then the global channel descriptors S of the upper layer and the lower layer are processed 1 、S 2 Stacked together and generating feature importance vector beta using channel-level soft attention operations 1 、β 2 ;
Finally, in the feature importance vector beta 1 、β 2 At the direction of the channel dimensionChannel-by-channel merging of upper layer features Y 1 And underlying feature Y 2 The final channel corrected feature Y is obtained as follows:
Y=β 1 Y 1 +β 2 Y 2 (6)
step 2.4, after the query Q and the key K pass through the SRU module in step 2.3, the similarity of the pixel pairs between the query and the key is calculated, and in the process of calculating the similarity, the unnecessary factors allocated with lower attention weights are covered by using the transposed attention matrix M.
Step 2.5, transposed attention matrix M, through the top-k module, adaptively selects the first k values with larger contribution scores, the top-k module being designed to remove the comparatively useless values in order to preserve the most useful values.
The k obtained in step 2.6 and step 2.5 are normalized in a certain range of each row for softmax calculation, and for other elements smaller than the k parameter, 0 is used for replacing probability values of corresponding positions of the elements, so that attention can be changed from dense to sparse through dynamic selection of the k parameter;
Multiplying the result of the query Q and the key K after the softmax obtained in the step 2.7 and the step 2.6 by the result of the value V obtained in the step 2.3;
the result of the multiplication in the step 2.8 and the step 2.7 is subjected to a mixed-scale feed-forward network MSFN module to remove some improper characteristics, and finally is output through 1×1 convolution decoding.
The step 3 is specifically implemented according to the following steps:
step 3.1, after the TKSA-SC module in step 2 outputs a result, the result is passed through an NRetinex module, the NRetinex module firstly removes inappropriate features in input features through an N-net network, the N-net comprises five convolution layers, the activation functions of the first four convolution layers are ReLU, the last layer is a sigmoid layer, and the final layer is used for normalizing the output to be within the range of [0,1], and a feature map i is generated after the NRetinex module passes through the N-net network;
step 3.2, after the characteristic map i passes through the N-Net network in step 3.1, the unsuitable characteristic in the input characteristic is removed, then the i passes through the L-Net network and the R-Net network to generate an illumination map and a reflection map corresponding to the i characteristic map, the L-Net network and the R-Net network are very similar to the network structure of the N-Net network and also correspondingly comprise five convolution layers, the activation functions of the first four convolution layers are ReLU, the last layer is a sigmoid layer, but according to the Retinex theory, the output channel of the L-Net network is set to be 1, and the output channel of the R-Net network is set to be 3, so that an illumination map L and a reflection map R corresponding to the i are generated;
Step 3.3, after the illumination pattern L and the reflection pattern R are generated in step 3.2, the NRetinex module adjusts the illumination pattern, and then recombines the adjusted illumination pattern L and the reflection pattern R generated just so as to obtain an enhanced image;
the enhanced image finally generated in the steps 3.4 and 3.3 is also subjected to N 0 The expert hybrid feature compensator MEFC provides complementary feature refinement to finally achieve a high quality of the clearly output reconstructed image I'.
The formula corresponding to Retinex theory is as follows:
where I is the result of the elemental level multiplication of illumination pattern L, where L is the illumination pattern, and reflection pattern R is the reflection pattern,is multiplication at the element level.
Step 4 is specifically implemented according to the following steps:
step 4.1, the loss function comprises two parts, L respectively MSE 、L SSIM Optimizing the loss function, and selecting L MSE Weight is 0.7, L SSIM The loss function with a weight of 0.3,
L MSE the formula of the loss function is as follows:
where loss (x, y) is the MSE loss function, x is the value of the model prediction output, y is the value of groundtrunk,
L SSIM the formula of the loss function is as follows:
wherein SSIM (x, y) is a local mean, μ, of local (x, y) points obtained by a window centered on (x, y) x Is the average value of x, mu y Is the average value of y and is,is the variance of x >Is the variance of y, sigma xy Is the covariance of x and y, C 1 、C 2 Are two variables that remain stable;
step 4.2, L MSE 、L SSIM By complementing each other, the network parameters are updated so that the network gradually converges.
Step 5 is specifically implemented according to the following steps:
step 5.1, selecting GPU with video memory of 12GB for training, running the whole frame on PyTorch, setting an initial learning rate to be 3X 10 in the first 92k iterations, wherein the network optimizer is an AdamW optimizer -4 Then reduce to 1×10 according to cosine annealing strategy -6 A total of 58k iterations were performed in an end-to-end learning fashion without expensive extensive pre-training.
And 5.2, storing the neural network parameters trained in the step 5.1, the number of training rounds of Epoch, the optimizer ADAM and the scheduler to obtain a trained network model.
The invention has the beneficial effects that the weak light enhancement method based on the sparse conversion network carries out feature selection through the feature selection block STB, wherein a space reconstruction unit SRU and a channel reconstruction unit CRU are added to reduce redundancy in a model, an NRetinex module is used for removing improper features in an image, an effective mixed scale feed-forward network MSFN is added to supplement information, and two mutually-complementary loss functions are added to restrict the network to achieve the optimal weak light image enhancement result.
Drawings
FIG. 1 is a schematic diagram of the overall structure of the weak light enhancement method based on sparse transform network of the present invention;
fig. 2 is a schematic diagram of a specific structure of a feature extraction module STB in the weak light enhancement method based on a sparse transform network according to the present invention;
FIG. 3 is a schematic diagram of the structure of the specific connection inside the STB-I and STB-II codec modules in the sparse-transform-network-based dim-light enhancement method of the present invention;
fig. 4 is a schematic structural diagram of a attention selection module TKSA-SC in the weak light enhancement method based on the sparse transform network of the present invention;
fig. 5 is a schematic structural diagram of a module NRetinex for removing inappropriate features in the sparse-transform-network-based dim-light enhancement method of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The weak light enhancement method based on the sparse conversion network combines with the method of figure 1, performs feature selection through a feature selection block STB, adds a space reconstruction unit SRU and a channel reconstruction unit CRU to reduce redundancy in a model, removes improper features in an image by using an NRetinex module, adds effective mixed-scale feed-forward network MSFN supplementary information, and adds two mutually complementary loss functions to restrict the network to achieve the optimal weak light image enhancement result.
The method is implemented according to the following steps:
step 1, a design feature extraction module STB, namely a spark TransformerBlock, STB module comprises a normalization layer LN, an attention selection module TKSA and a module NRetinex for removing improper features, and rich features of low-light images with complex environments are extracted by stacking STBs with different spatial resolutions and channel dimensions;
the step 1 is specifically implemented according to the following steps:
step 1.1, firstly, giving a weak light imageAnd overlapping image blocks are embedded using a 3 x 3 convolution;
step 1.2, the image block embedded in the step 1.1 is sent to an expert mixed characteristic compensator MEFC module, and N is used together 0 The individual MEFC modules provide complementary feature refinements;
step 1.3, the STB block includes two parts of STB-I and STB-II, the STB-I stacks three STB blocks for encoding and the STB-II stacks three STB blocks for decoding. Inputting the features refined by the MEFC module in the step 1.2 into the STB-I, wherein the STB-I stacks N i A plurality of STBs code, N i ∈[1,2,3]The features coded by STB-I are input into a single STB block, then the feature results output by the single STB block are input into STB-II, which is also stacked with N i ∈[1,2,3]Decoding by the STB, and finally outputting a characteristic result after STB-II block decoding; wherein each STB includes its own specific spatial resolution and channel dimensions for mining a multi-scale representation in an image in a low-light environment, by which it extracts rich features with complex low-light environment images;
Formally, the input feature X of the (l-1) th block is given l-1 The encoding process of STB is shown in the formula (1) (2):
X l ′=X l-1 +TKSA-SC(LN(X l-1 )) (1)
X l =X l ′+NRetinex(LN(X l ′)) (2)
wherein l represents the number of STB blocks, X l Represents the output of NRetinex, X l ' represents the output of TKSA, X l-1 Representing the input features of the (l-1) th block, "LN" represents layer normalization, "TKSA-SC" represents TKSA-SC module, "NRetinex" represents NRetinex module;
step 1.4 junction outputting step 1.3 after STB-IIThe result is input into an expert mixed feature compensator MEFC module again, then the final feature result is output through 3X 3 convolution, and then the original input dim light image I is output low The feature results added here supplement the image details to form the final output dim-light enhanced image.
Step 2, designing an attention selection module TKSA-SC, namely Top-k Sparse Attention-SRU and CRU, a space reconstruction unit SRU, namely Spatial Reconstruction Unit, a channel reconstruction unit CRU, namely Channel Reconstruction Unit, wherein the TKSA-SC module further comprises a Top-k selection module and a mixed scale feed-forward network MSFN module, so as to adaptively reserve the most useful attention value and reduce redundancy in the feature mapping, thereby realizing better feature aggregation;
Referring to fig. 2 to 5, step 2 is specifically performed as follows:
step 2.1, attention selection Module TKSA-SC as part of the STB Module in step 1, given Query Q, key K and Value V, the dimensions of Query Q, key K and Value V areThe output of dot product attention is generally expressed as:
wherein Att (Q, K, V) represents the mechanism of attention, Q, K and V represent the query Q, key K and value V, respectively, in matrix form, lambda being an optional temperature factor defined asPerforming multi-head attention on each new Q, K and V to obtain channel dimension output, connecting the outputs, and obtaining the final result of all heads through linear projection;
in the TKSA module, the TKSA first applies a 1×1 convolution and a 3×3 depth convolution Dw-Conv to perform context coding on the input of the entire STB module in the channel dimension;
step 2.2, the query Q and the key K divided by the convolution of step 2.1 are passed through a spatial reconstruction unit SRU, which comprises the operations of separation and reconstruction, in particular as follows:
the SRU unit comprises a grouping normalization layer, the SRU firstly utilizes the scaling factors in the grouping normalization layer to evaluate the information richness of different feature graphs, and the query Q after the step 2.1 is evaluated to obtain informativeness weight W 1 The key K after the step 2.1 is evaluated to obtain the non-informative weight W 2 ,
Then, the query Q generated in the step 2.1 is respectively combined with the informative weight W 1 Multiplying to obtain a weighted feature: informative features X 1w Query Q and non-informative weight W 2 Multiplying to obtain a weighted feature: less informative feature X 2w Thus, query Q is successfully split into two parts: x is X 1w With rich information and strong expressive power, X 2w Almost no information, considered as redundant parts, key K also repeats the above step of querying Q, so that the split operation in the entire SRU is ended;
after the separation operation, in order to reduce the spatial redundancy, further, the reconstruction operation adds the information-rich feature to the information-less feature, so that the information-rich feature can be generated and the space can be saved. The two parts are not directly added, the reconstruction operation adopts a cross reconstruction operation, the weighting characteristics of two different information are fully combined, the information flow between the two information is enhanced, and X is calculated 1w Divided into X 11w And X 12w Likewise, X is 2w Divided into X 21w And X 22w Then X is taken up 11w And X is 22w Adding to obtain X w1 Likewise X is 12w And X 21w Adding to obtain X w2 ;
Then, the cross-reconstructed feature X W1 And X W2 Splicing to obtain a final spatially refined feature map X w ;
Step 2.3, passing the value V divided in step 2.1 through a channel reconstruction unit CRU, which uses a split-transform-fusion strategy,the separation operation first divides the channel of the input feature, i.e. the value V part, into two parts, an oc channel and a (1-oc) C channel, respectively, and then uses the channel of the 1X 1 convolution compression feature map to increase the computational efficiency, and the value V divided in step 2.1 is divided into the upper layer feature X through the division and compression operation up And underlying feature X low ;
In successful separation of the value V divided in step 2.1 into the upper layer features X up And underlying feature X low Later, the transformation operation is to transform the upper layer characteristic X up Feeding into the upper layer transformation stage as a "rich feature extractor", at X up A kxk group level convolution GWC operation and a 1 x 1 point convolution PWC operation are performed thereon. Subsequently, the two partial results of the group level convolution GWC operation and the point convolution PWC operation are added to form a combined representative feature map Y 1 The upper layer transformation stage can be expressed as:
wherein,is a learnable weight matrix of group level convolution GWC,/for>Is a learnable weight matrix of point convolution PWC, i.e. the upper layer transformation stage uses the same characteristic diagram X up The combination of group level convolution GWC and dot convolution PWC is performed to finally extract representative feature Y 1 Simultaneously, the calculation cost is reduced;
X low entering a lower layer transformation stage, generating a characteristic diagram with shallow hidden details by using only 1X 1 point convolution PWC in the lower layer stage, supplementing upper layer characteristics, and finally carrying out point convolution PWC operation on the characteristic diagram and X entering the lower layer transformation stage low Connected to form the output Y of the lower stage 2 The formula is as follows:
here, theIs a learnable weight matrix of the point convolution PWC, and U is a connection operation;
after the transformation operation, fusion should be performed, instead of directly stitching or adding the two types of features, the output features Y from the upper and lower transformation stages are adaptively combined using a simplified SKNet approach 1 And Y 2 。
The fusion process first applies pooling to the output features to collect global spatial information S with channel statistics m Then the global channel descriptors S of the upper layer and the lower layer are processed 1 、S 2 Stacked together and generating feature importance vector beta using channel-level soft attention operations 1 、β 2 ;
Finally, in the feature importance vector beta 1 、β 2 Under direction of (a) merging upper layer features Y channel by channel in the channel dimension 1 And underlying feature Y 2 The final channel corrected feature Y is obtained as follows:
Y=β 1 Y 1 +β 2 Y 2 (6)
step 2.4, after the query Q and the key K pass through the SRU module in step 2.3, the similarity of the pixel pairs between the query and the key is calculated, and in the process of calculating the similarity, the unnecessary factors allocated with lower attention weights are covered by using the transposed attention matrix M.
Step 2.5, transposed attention matrix M is passed through a top-k module (where k is an adjustable parameter for dynamically controlling the magnitude of sparsity, and can be obtained by weighted averaging) where the first k contribution scores are adaptively chosen, and the purpose of the top-k module is to eliminate the values that are not comparatively useful in order to preserve the most useful values.
The k obtained in step 2.6 and step 2.5 are normalized in a certain range of each row for softmax calculation, and for other elements smaller than the k parameter, 0 is used for replacing probability values of corresponding positions of the elements, so that attention can be changed from dense to sparse through dynamic selection of the k parameter;
multiplying the result of the query Q and the key K after the softmax obtained in the step 2.7 and the step 2.6 by the result of the value V obtained in the step 2.3;
The result of the multiplication in the step 2.8 and the step 2.7 is subjected to a mixed-scale feed-forward network MSFN module to remove some improper characteristics, and finally is output through 1×1 convolution decoding.
And 3, designing and removing an inappropriate feature module NRetinex, wherein the inappropriate feature module NRetinex comprises three networks of N-Net, L-Net and R-Net. Removing inappropriate features after feature aggregation by using N-Net, then decomposing input into a reflection diagram and an illumination diagram by using L-Net and R-Net, then adjusting the illumination diagram by using a module NRetiinex, and recombining the illumination diagram after the adjustment of the NRetiinex module with the reflection diagram to obtain an enhanced image;
the step 3 is specifically implemented according to the following steps:
step 3.1, after the TKSA-SC module in step 2 outputs a result, the result passes through an NRetinex module, the NRetinex module firstly removes improper characteristics in input characteristics through an N-net network, the network structure of the N-net is very simple, the N-net comprises five convolution layers, the activation functions of the first four convolution layers are ReLU, the last layer is a sigmoid layer, and the sigmoid layer is used for normalizing the output to be in the range of [0,1], and a characteristic diagram i is generated after the N-net network passes through;
step 3.2, after the characteristic map i passes through the N-Net network in step 3.1, the unsuitable characteristic in the input characteristic is removed, then the i passes through the L-Net network and the R-Net network to generate an illumination map and a reflection map corresponding to the i characteristic map, the L-Net network and the R-Net network are very similar to the network structure of the N-Net network and also correspondingly comprise five convolution layers, the activation functions of the first four convolution layers are ReLU, the last layer is a sigmoid layer, but according to the Retinex theory, the output channel of the L-Net network is set to be 1, and the output channel of the R-Net network is set to be 3, so that an illumination map L and a reflection map R corresponding to the i are generated;
Step 3.3, after the illumination pattern L and the reflection pattern R are generated in step 3.2, the NRetinex module adjusts the illumination pattern, and then recombines the adjusted illumination pattern L and the reflection pattern R generated just so as to obtain an enhanced image;
the enhanced image finally generated in the steps 3.4 and 3.3 is also subjected to N 0 The expert hybrid feature compensator MEFC provides complementary feature refinement to finally achieve a high quality of the clearly output reconstructed image I'.
The formula corresponding to Retinex theory is as follows:
where I is the result of the elemental level multiplication of illumination pattern L, where L is the illumination pattern, and reflection pattern R is the reflection pattern,is multiplication at the element level.
Step 4, the loss function comprises two: l (L) MSE And L SSIM ,L MSE Is a mean square error loss function, which is the mean of the sum of squares of the prediction-generated image and the original input image at the corresponding point error, L SSIM Is an effective measure of the structural similarity between the self-generated image with enhanced weak light and the groudtluth image. The two loss functions complement each other, so that the mutual performance can be improved, and finally, a network for outputting the optimal weak light image enhancement is obtained;
step 4 is specifically implemented according to the following steps:
step 4.1, the loss function comprises two parts, L respectively MSE 、L SSIM Optimizing the loss function, and selecting L MSE Weight is 0.7, L SSIM The loss function with a weight of 0.3,
L MSE the formula of the loss function is as follows:
where loss (x, y) is the MSE loss function, x is the value of the model prediction output, y is the value of groundtrunk,
L SSIM the formula of the loss function is as follows:
wherein SSIM (x, y) is a local mean, μ, of local (x, y) points obtained by a window centered on (x, y) x Is the average value of x, mu y Is the average value of y and is,is the variance of x>Is the variance of y, sigma xy Is the covariance of x and y, C 1 、C 2 Are two variables that remain stable;
step 4.2, L MSE 、L SSIM By complementing each other, the network parameters are updated so that the network gradually converges.
Step 5, training an effective sparse transducer network built by a weak light enhancement method based on a sparse transformation network by using a training set in a public reference data set LOL, training 150000 epochs, verifying a training result and storing a neural network model; the effective sparse transform network built by the weak light enhancement method based on the sparse transform network comprises an STB block, a TKSA-SC block and an NRetinex block;
step 5 is specifically implemented according to the following steps:
step 5.1, selecting GPU with video memory of 12GB for training, running the whole frame on PyTorch, setting an initial learning rate to be 3X 10 in the first 92k iterations, wherein the network optimizer is an AdamW optimizer -4 Then reduce to 1×10 according to cosine annealing strategy -6 A total of 58k iterations were performed in an end-to-end learning fashion without expensive extensive pre-training.
And 5.2, storing the neural network parameters trained in the step 5.1, the number of training rounds of Epoch, the optimizer ADAM and the scheduler to obtain a trained network model.
And 6, loading the network model trained in the step 5, inputting the test set in the public reference data set LOL into the network model trained in the step 5, and then storing the test result to obtain the image after weak light enhancement.
The invention relates to a weak light enhancement method based on a sparse conversion network, which performs feature selection through a feature selection block STB, wherein a space reconstruction unit SRU and a channel reconstruction unit CRU are added to reduce redundancy in a model, an NRetinex module is used for removing improper features in an image, an effective mixed scale feed-forward network MSFN is added to supplement information, and two mutually-supplemented loss functions are added to avoid manual assignment so as to restrict the network to achieve the optimal weak light image enhancement result. The method has a certain practical significance in recovering the image close to normal light from the environment with low light.
Example 1
The invention discloses a weak light enhancement method based on a sparse conversion network, which is implemented according to the following steps:
step 1, designing a feature extraction module STB, wherein the STB module comprises a normalization layer LN, an attention selection module TKSA and a module NRetinex for removing improper features, and extracting rich features of a weak light image with a complex environment by stacking STBs with different spatial resolutions and channel dimensions;
step 2, designing an attention selection module TKSA-SC, wherein the TKSA-SC module further comprises a Top-k selection module and a mixed scale feed-forward network MSFN module, so as to adaptively reserve the most useful attention value and reduce redundancy in the feature mapping, thereby realizing better feature aggregation;
and 3, designing and removing an inappropriate feature module NRetinex, wherein the inappropriate feature module NRetinex comprises three networks of N-Net, L-Net and R-Net. Removing inappropriate features after feature aggregation by using N-Net, then decomposing input into a reflection diagram and an illumination diagram by using L-Net and R-Net, then adjusting the illumination diagram by using a module NRetiinex, and recombining the illumination diagram after the adjustment of the NRetiinex module with the reflection diagram to obtain an enhanced image;
step 4, the loss function comprises two: l (L) MSE And L SSIM ,L MSE Is a mean square error loss function, which is the mean of the sum of squares of the prediction-generated image and the original input image at the corresponding point error, L SSIM Is an effective measure of the structural similarity between the self-generated image with enhanced weak light and the groudtluth image. Finally, a network for outputting optimal weak light image enhancement is obtained;
step 5, training an effective sparse transducer network built by a weak light enhancement method based on a sparse transformation network by using a training set in a public reference data set LOL, training 150000 epochs, verifying a training result and storing a neural network model; the effective sparse transform network built by the weak light enhancement method based on the sparse transform network comprises an STB block, a TKSA-SC block and an NRetinex block;
and 6, loading the network model trained in the step 5, inputting the test set in the public reference data set LOL into the network model trained in the step 5, and then storing the test result to obtain the image after weak light enhancement.
Example 2
Step 1, designing a feature extraction module STB, wherein the STB module comprises a normalization layer LN, an attention selection module TKSA and a module NRetinex for removing improper features, and extracting rich features of a weak light image with a complex environment by stacking STBs with different spatial resolutions and channel dimensions;
The step 1 is specifically implemented according to the following steps:
step 1.1, firstly, giving a weak light imageAnd overlapping image blocks are embedded using a 3 x 3 convolution;
step 1.2, the image block embedded in the step 1.1 is sent to an expert mixed characteristic compensator MEFC module, and N is used together 0 The individual MEFC modules provide complementary feature refinements;
step 1.3, the STB block includes two parts of STB-I and STB-II, the STB-I stacks three STB blocks for encoding and the STB-II stacks three STB blocks for decoding. Inputting the features refined by the MEFC module in the step 1.2 into the STB-I, wherein the STB-I stacks N i A plurality of STBs code, N i ∈[1,2,3]The features coded by STB-I are input into a single STB block, then the feature results output by the single STB block are input into STB-II, which is also stacked with N i ∈[1,2,3]Decoding by the STB, and finally outputting a characteristic result after STB-II block decoding;
formally, the input feature X of the (l-1) th block is given l-1 The encoding process of STB is shown in the formula (1) (2):
X l ′=X l-1 +TKSA-SC(LN(X l-1 )) (1)
X l =X l ′+NRetinex(LN(X l ′)) (2)
wherein l represents the number of STB blocks, X l Represents the output of NRetinex, X l ' represents the output of TKSA, X l-1 Representing the input features of the (l-1) th block, "LN" represents layer normalization, "TKSA-SC" represents TKSA-SC module, "NRetinex" represents NRetinex module;
Step 1.4, inputting the result output after the step 1.3 passes through STB-II to the expert hybrid feature compensator MEFC module again, outputting the final feature result through 3X 3 convolution, and inputting the original weak light image I low The feature results added here supplement the image details to form the final output dim-light enhanced image.
Step 2, designing an attention selection module TKSA-SC, wherein the TKSA-SC module further comprises a Top-k selection module and a mixed scale feed-forward network MSFN module, so as to adaptively reserve the most useful attention value and reduce redundancy in the feature mapping, thereby realizing better feature aggregation;
and 3, designing and removing an inappropriate feature module NRetinex, wherein the inappropriate feature module NRetinex comprises three networks of N-Net, L-Net and R-Net. Removing inappropriate features after feature aggregation by using N-Net, then decomposing input into a reflection diagram and an illumination diagram by using L-Net and R-Net, then adjusting the illumination diagram by using a module NRetiinex, and recombining the illumination diagram after the adjustment of the NRetiinex module with the reflection diagram to obtain an enhanced image;
the step 3 is specifically implemented according to the following steps:
step 3.1, after the TKSA-SC module in step 2 outputs a result, the result passes through an NRetinex module, the NRetinex module firstly removes improper characteristics in input characteristics through an N-net network, the network structure of the N-net is very simple, the N-net comprises five convolution layers, the activation functions of the first four convolution layers are ReLU, the last layer is a sigmoid layer, and the sigmoid layer is used for normalizing the output to be in the range of [0,1], and a characteristic diagram i is generated after the N-net network passes through;
Step 3.2, after the characteristic map i passes through the N-Net network in step 3.1, the unsuitable characteristic in the input characteristic is removed, then the i passes through the L-Net network and the R-Net network to generate an illumination map and a reflection map corresponding to the i characteristic map, the L-Net network and the R-Net network are very similar to the network structure of the N-Net network and also correspondingly comprise five convolution layers, the activation functions of the first four convolution layers are ReLU, the last layer is a sigmoid layer, but according to the Retinex theory, the output channel of the L-Net network is set to be 1, and the output channel of the R-Net network is set to be 3, so that an illumination map L and a reflection map R corresponding to the i are generated;
step 3.3, after the illumination pattern L and the reflection pattern R are generated in step 3.2, the NRetinex module adjusts the illumination pattern, and then recombines the adjusted illumination pattern L and the reflection pattern R generated just so as to obtain an enhanced image;
the enhanced image finally generated in the steps 3.4 and 3.3 is also subjected to N 0 The expert hybrid feature compensator MEFC provides complementary feature refinement to finally achieve a high quality of the clearly output reconstructed image I'.
The formula corresponding to Retinex theory is as follows:
where I is the result of the elemental level multiplication of illumination pattern L, where L is the illumination pattern, and reflection pattern R is the reflection pattern, Is multiplication at the element level.
Step 4, the loss function comprises two: l (L) MSE And L SSIM ,L MSE Is a mean square error loss function, which is the mean of the sum of squares of the prediction-generated image and the original input image at the corresponding point error, L SSIM Is an effective measure of the structural similarity between the self-generated image with enhanced weak light and the groudtluth image. The two loss functions complement each other, so that the mutual performance can be improved, and finally, a network for outputting the optimal weak light image enhancement is obtained;
step 5, training an effective sparse transducer network built by a weak light enhancement method based on a sparse transformation network by using a training set in a public reference data set LOL, training 150000 epochs, verifying a training result and storing a neural network model; the effective sparse transform network built by the weak light enhancement method based on the sparse transform network comprises an STB block, a TKSA-SC block and an NRetinex block;
example 3
Step 1, designing a feature extraction module STB, namely Sparse Transformer Block, wherein the STB module comprises a normalization layer LN, an attention selection module TKSA and a module NRetinex for removing inappropriate features, and extracting rich features of a low-light image with a complex environment by stacking STBs with different spatial resolutions and channel dimensions;
Step 2, designing an attention selection module TKSA-SC, namely Top-k Sparse Attention-SRU and CRU, a space reconstruction unit SRU, namely Spatial Reconstruction Unit, a channel reconstruction unit CRU, namely Channel Reconstruction Unit, wherein the TKSA-SC module further comprises a Top-k selection module and a mixed scale feed-forward network MSFN module, so as to adaptively reserve the most useful attention value and reduce redundancy in the feature mapping, thereby realizing better feature aggregation;
and 3, designing and removing an inappropriate feature module NRetinex, wherein the inappropriate feature module NRetinex comprises three networks of N-Net, L-Net and R-Net. Removing inappropriate features after feature aggregation by using N-Net, then decomposing input into a reflection diagram and an illumination diagram by using L-Net and R-Net, then adjusting the illumination diagram by using a module NRetiinex, and recombining the illumination diagram after the adjustment of the NRetiinex module with the reflection diagram to obtain an enhanced image;
step 4, the loss function comprises two: l (L) MSE And L SSIM ,L MSE Is a mean square error loss function, which is the mean of the sum of squares of the prediction-generated image and the original input image at the corresponding point error, L SSIM Is an effective measure of the structural similarity between the self-generated image with enhanced weak light and the group trunk image. The two loss functions complement each other, so that the mutual performance can be improved, and finally, a network for outputting the optimal weak light image enhancement is obtained;
Step 4 is specifically implemented according to the following steps:
step 4.1, the loss function comprises two parts, L respectively MSE 、L SSIM Optimizing the loss function, and selecting L MSE Weight is 0.7, L SSIM The loss function with a weight of 0.3,
L MSE the formula of the loss function is as follows:
where loss (x, y) is the MSE loss function, x is the value of the model prediction output, y is the value of the group trunk,
L SSIM the formula of the loss function is as follows:
wherein SSIM (x, y) is a local mean, μ, of local (x, y) points obtained by a window centered on (x, y) x Is the average value of x, mu y Is the average value of y and is,is the variance of x>Is the variance of y, sigma xy Is the covariance of x and y, C 1 、C 2 Are two variables that remain stable;
step 4.2, L MSE 、L SSIM By complementing each other, the network parameters are updated so that the network gradually converges.
Step 5, training an effective sparse transducer network built by a weak light enhancement method based on a sparse transformation network by using a training set in a public reference data set LOL, training 150000 epochs, verifying a training result and storing a neural network model; the effective sparse transform network built by the weak light enhancement method based on the sparse transform network comprises an STB block, a TKSA-SC block and an NRetinex block;
step 5 is specifically implemented according to the following steps:
Step 5.1, selecting GPU with video memory of 12GB for training, running the whole frame on PyTorch, setting an initial learning rate to be 3X 10 in the first 92k iterations, wherein the network optimizer is an AdamW optimizer -4 Then reduce to 1×10 according to cosine annealing strategy -6 A total of 58k iterations were performed in an end-to-end learning fashion without expensive extensive pre-training.
And 5.2, storing the neural network parameters trained in the step 5.1, the number of training rounds of Epoch, the optimizer ADAM and the scheduler to obtain a trained network model.
And 6, loading the network model trained in the step 5, inputting the test set in the public reference data set LOL into the network model trained in the step 5, and then storing the test result to obtain the image after weak light enhancement.
Claims (7)
1. The weak light enhancement method based on the sparse conversion network is characterized in that feature selection is carried out through a feature selection block STB, redundancy in a space reconstruction unit SRU and a channel reconstruction unit CRU reduction model is added, improper features in an image are removed by using an NRetinex module, effective mixed scale feed-forward network MSFN supplementary information is added, and two mutually complementary loss functions are added, so that the network is constrained to achieve an optimal weak light image enhancement result.
2. The sparse representation network-based dim light enhancement method according to claim 1, characterized by the specific implementation steps of:
step 1, designing a feature extraction module STB, wherein the STB module comprises a normalization layer LN, an attention selection module TKSA and a module NRetinex for removing improper features, and extracting rich features of a weak light image with a complex environment by stacking STBs with different spatial resolutions and channel dimensions;
step 2, designing an attention selection module TKSA-SC, wherein the TKSA-SC module further comprises a Top-k selection module and a mixed scale feed-forward network MSFN module, so as to adaptively reserve the most useful attention value and reduce redundancy in the feature mapping, thereby realizing better feature aggregation;
step 3, designing an improper feature removal module NRetinex, wherein the improper feature removal module NRetinex comprises three networks of N-Net, L-Net and R-Net, removing improper features after feature aggregation by using the N-Net, then decomposing input into a reflection diagram and an illumination diagram by using the L-Net and the R-Net, then adjusting the illumination diagram by using the module NRetinex, and recombining the illumination diagram adjusted by the NRetinex module with the reflection diagram to obtain an enhanced image;
step 4, the loss function comprises two: l (L) MSE And L SSIM ,L MSE Is a mean square error loss function, which is the mean of the sum of squares of the prediction-generated image and the original input image at the corresponding point error, L SSIM Is an effective measure of the structural similarity between the self-generated image with enhanced weak light and the groudtluth image, and is the most effective measureFinally obtaining a network for outputting the optimal weak light image enhancement;
step 5, training an effective sparse transducer network built by a weak light enhancement method based on a sparse transformation network by using a training set in a public reference data set LOL, training 150000 epochs, verifying a training result and storing a neural network model; the effective sparse transform network built by the weak light enhancement method based on the sparse transform network comprises an STB block, a TKSA-SC block and an NRetinex block;
and 6, loading the network model trained in the step 5, inputting the test set in the public reference data set LOL into the network model trained in the step 5, and then storing the test result to obtain the image after weak light enhancement.
3. The weak light enhancement method based on sparse transform network according to claim 2, wherein said step 1 is specifically implemented as follows:
Step 1.1, firstly, giving a weak light imageAnd overlapping image blocks are embedded using a 3 x 3 convolution;
step 1.2, the image block embedded in the step 1.1 is sent to an expert mixed characteristic compensator MEFC module, and N is used together 0 The individual MEFC modules provide complementary feature refinements;
step 1.3, wherein the STB block comprises two parts of an STB-I and an STB-II, the STB-I stacks three STB blocks for encoding, the STB-II stacks three STB blocks for decoding, the features refined by the MEFC module in step 1.2 are input into the STB-I, and the STB-I stacks N i A plurality of STBs code, N i ∈[1,2,3]The features coded by STB-I are input into a single STB block, then the feature results output by the single STB block are input into STB-II, which is also stacked with N i ∈[1,2,3]Decoding by the STB, and finally outputting a characteristic result after STB-II block decoding;
formally, given the (l-1) th blockInput feature X l-1 The encoding process of STB is shown in the formula (1) (2):
X l ′=X l-1 +TKSA-SC(LN(X l-1 )) (1)
X l =X l ′+NRetinex(LN(X l ′)) (2)
wherein l represents the number of STB blocks, X l Represents the output of NRetinex, X l ' represents the output of TKSA, X l-1 Representing the input features of the (l-1) th block, "LN" represents layer normalization, "TKSA-SC" represents TKSA-SC module, "NRetinex" represents NRetinex module;
Step 1.4, inputting the result output after the step 1.3 passes through STB-II to the expert hybrid feature compensator MEFC module again, outputting the final feature result through 3X 3 convolution, and inputting the original weak light image I low The feature results added here supplement the image details to form the final output dim-light enhanced image.
4. The weak light enhancement method based on sparse transform network according to claim 3, wherein said step 2 is specifically implemented as follows:
step 2.1, attention selection Module TKSA-SC as part of the STB Module in step 1, given query Q, key K and value V, the dimensions of query Q, key K and value V are allThe output of dot product attention is generally expressed as:
wherein Att (Q, K, V) represents the mechanism of attention, Q, K and V represent the query Q, key K and value V, respectively, in matrix form, lambda being an optional temperature factor defined asPerforming multi-head attention on each new Q, K and V to obtain channel dimension output, connecting the outputs, and obtaining the final result of all heads through linear projection;
in the TKSA module, the TKSA first applies a 1×1 convolution and a 3×3 depth convolution Dw-Conv to perform context coding on the input of the entire STB module in the channel dimension;
Step 2.2, the query Q and the key K divided by the convolution of step 2.1 are passed through a spatial reconstruction unit SRU, which comprises the operations of separation and reconstruction, in particular as follows:
the SRU unit comprises a grouping normalization layer, the SRU firstly utilizes the scaling factors in the grouping normalization layer to evaluate the information richness of different feature graphs, and the query Q after the step 2.1 is evaluated to obtain informativeness weight W 1 The key K after the step 2.1 is evaluated to obtain the non-informative weight W 2 ,
Then, the query Q generated in the step 2.1 is respectively combined with the informative weight W 1 Multiplying to obtain a weighted feature: informative features X 1w Query Q and non-informative weight W 2 Multiplying to obtain a weighted feature: less informative feature X 2w Thus, query Q is successfully split into two parts: x is X 1w With rich information and strong expressive power, X 2w Almost no information, considered as redundant parts, key K also repeats the above step of querying Q, so that the split operation in the entire SRU is ended;
the reconstruction operation adopts cross reconstruction operation, fully combines the weighting characteristics of two different information, enhances the information flow between the two information, and leads X to be 1w Divided into X 11w And X 12w Likewise, X is 2w Divided into X 21w And X 22w Then X is taken up 11w And X is 22w Adding to obtain X w1 Likewise X is 12w And X 21w Adding to obtain X w2 ;
Then, the cross-reconstructed feature X W1 And X W2 Splicing to obtain a final spatially refined feature map X w ;
Step 2.3, the value V divided in step 2.1 is passed through a channel reconstruction unit CRU, which uses a strategy of separation-transformation-fusion, the separation operation firstly divides the channel of the input feature, i.e. the value V part, into two parts, an αc channel and a (1- α) C channel, respectively, and then uses the channel of the 1×1 convolution compression feature map to increase the calculation efficiency, and the value V divided in step 2.1 is divided into an upper layer feature X through the division and compression operation up And underlying feature X low ;
In successful separation of the value V divided in step 2.1 into the upper layer features X up And underlying feature X low Later, the transformation operation is to transform the upper layer characteristic X up Feeding into the upper layer transformation stage as a "rich feature extractor", at X up Performing k×k group-level convolution GWC operation and 1×1 point convolution PWC operation thereon, and then adding the two partial results of the group-level convolution GWC operation and the point convolution PWC operation to form a combined representative feature map Y 1 The upper layer transformation stage can be expressed as:
wherein, Is a learnable weight matrix of group level convolution GWC,/for>Is a learnable weight matrix of point convolution PWC, i.e. the upper layer transformation stage uses the same characteristic diagram X up The combination of group level convolution GWC and dot convolution PWC is performed to finally extract representative feature Y 1 Simultaneously, the calculation cost is reduced;
X low entering into lower layer transformation stage, generating feature map with shallow hidden detail by using 1×1 point convolution PWC in lower layer stage, supplementing upper layer feature, and final point rollingIntegrating a feature map of PWC operations and X into a lower level transformation stage low Connected to form the output Y of the lower stage 2 The formula is as follows:
here, theIs a learnable weight matrix of the point convolution PWC, and U is a connection operation;
the fusion process first applies pooling to the output features to collect global spatial information S with channel statistics m Then the global channel descriptors S of the upper layer and the lower layer are processed 1 、S 2 Stacked together and generating feature importance vector beta using channel-level soft attention operations 1 、β 2 ;
Finally, in the feature importance vector beta 1 、β 2 Under direction of (a) merging upper layer features Y channel by channel in the channel dimension 1 And underlying feature Y 2 The final channel corrected feature Y is obtained as follows:
Y=β 1 Y 1 +β 2 Y 2 (6)
Step 2.4, after the query Q and the key K pass through the SRU module in step 2.3, calculating the similarity of pixel pairs between the query and the key, and masking unnecessary factors allocated with lower attention weights by using the transposed attention matrix M in the process of calculating the similarity;
step 2.5, the transposed attention matrix M adaptively selects the first k values with larger contribution scores through a top-k module, wherein the top-k module aims to remove the values which are not used in comparison in order to preserve the most useful values;
the k obtained in step 2.6 and step 2.5 are normalized in a certain range of each row for softmax calculation, and for other elements smaller than the k parameter, 0 is used for replacing probability values of corresponding positions of the elements, so that attention can be changed from dense to sparse through dynamic selection of the k parameter;
multiplying the result of the query Q and the key K after the softmax obtained in the step 2.7 and the step 2.6 by the result of the value V obtained in the step 2.3;
the result of the multiplication in the step 2.8 and the step 2.7 is subjected to a mixed-scale feed-forward network MSFN module to remove some improper characteristics, and finally is output through 1×1 convolution decoding.
5. The sparse representation network-based dim light enhancement method according to claim 4, wherein said step 3 is specifically implemented as follows:
Step 3.1, after the TKSA-SC module in step 2 outputs a result, the result is passed through an NRetinex module, the NRetinex module firstly removes inappropriate features in input features through an N-net network, the N-net comprises five convolution layers, the activation functions of the first four convolution layers are ReLU, the last layer is a sigmoid layer, and the final layer is used for normalizing the output to be within the range of [0,1], and a feature map i is generated after the NRetinex module passes through the N-net network;
step 3.2, after the characteristic map i passes through the N-Net network in step 3.1, the unsuitable characteristic in the input characteristic is removed, then the i passes through the L-Net network and the R-Net network to generate an illumination map and a reflection map corresponding to the i characteristic map, the L-Net network and the R-Net network are very similar to the network structure of the N-Net network and also correspondingly comprise five convolution layers, the activation functions of the first four convolution layers are ReLU, the last layer is a sigmoid layer, but according to the Retinex theory, the output channel of the L-Net network is set to be 1, and the output channel of the R-Net network is set to be 3, so that an illumination map L and a reflection map R corresponding to the i are generated;
step 3.3, after the illumination pattern L and the reflection pattern R are generated in step 3.2, the NRetinex module adjusts the illumination pattern, and then recombines the adjusted illumination pattern L and the reflection pattern R generated just so as to obtain an enhanced image;
The enhanced image finally generated in the steps 3.4 and 3.3 is also subjected to N 0 Expert hybrid feature compensator MEFC provides complementary feature refinement to ultimately achieve a high quality, sharp output reconstructed image I'.
The formula corresponding to Retinex theory is as follows:
where I is the result of the elemental level multiplication of illumination pattern L, where L is the illumination pattern, and reflection pattern R is the reflection pattern,is multiplication at the element level.
6. The sparse representation network-based dim light enhancement method according to claim 5, wherein said step 4 is specifically implemented as follows:
step 4.1, the loss function comprises two parts, L respectively MSE 、L SSIM Optimizing the loss function, and selecting L MSE Weight is 0.7, L SSIM The loss function with a weight of 0.3,
L MSE the formula of the loss function is as follows:
where loss (x, y) is the MSE loss function, x is the value of the model prediction output, y is the value of groundtrunk,
L SSIM the formula of the loss function is as follows:
wherein SSIM (x, y) is a local mean, μ, of local (x, y) points obtained by a window centered on (x, y) x Is the average value of x, mu y Is the average value of y and is,is the variance of x>Is the variance of y, sigma xy Is the covariance of x and y, C 1 、C 2 Are two variables that remain stable;
step 4.2, L MSE 、L SSIM By complementing each other, the network parameters are updated so that the network gradually converges.
7. The sparse representation network-based dim light enhancement method according to claim 6, wherein said step 5 is specifically implemented as follows:
step 5.1, selecting GPU with video memory of 12GB for training, running the whole frame on PyTorch, setting an initial learning rate to be 3X 10 in the first 92k iterations, wherein the network optimizer is an AdamW optimizer -4 Then reduce to 1×10 according to cosine annealing strategy -6 Performing 58k iterations in total, and performing in an end-to-end learning mode;
and 5.2, storing the neural network parameters trained in the step 5.1, the number of training rounds of Epoch, the optimizer ADAM and the scheduler to obtain a trained network model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311178208.8A CN117132500A (en) | 2023-09-13 | 2023-09-13 | Weak light enhancement method based on sparse conversion network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311178208.8A CN117132500A (en) | 2023-09-13 | 2023-09-13 | Weak light enhancement method based on sparse conversion network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117132500A true CN117132500A (en) | 2023-11-28 |
Family
ID=88859910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311178208.8A Pending CN117132500A (en) | 2023-09-13 | 2023-09-13 | Weak light enhancement method based on sparse conversion network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117132500A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118657674A (en) * | 2024-08-22 | 2024-09-17 | 山东恒辉软件有限公司 | Weak light image enhancement analysis method based on attention mechanism |
-
2023
- 2023-09-13 CN CN202311178208.8A patent/CN117132500A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118657674A (en) * | 2024-08-22 | 2024-09-17 | 山东恒辉软件有限公司 | Weak light image enhancement analysis method based on attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hui et al. | Fast and accurate single image super-resolution via information distillation network | |
CN112287940A (en) | Semantic segmentation method of attention mechanism based on deep learning | |
CN110136062B (en) | Super-resolution reconstruction method combining semantic segmentation | |
CN113658057B (en) | Swin converter low-light-level image enhancement method | |
CN111259904B (en) | Semantic image segmentation method and system based on deep learning and clustering | |
CN110225350B (en) | Natural image compression method based on generation type countermeasure network | |
CN113096017A (en) | Image super-resolution reconstruction method based on depth coordinate attention network model | |
CN110363068B (en) | High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network | |
CN113240683B (en) | Attention mechanism-based lightweight semantic segmentation model construction method | |
CN113076957A (en) | RGB-D image saliency target detection method based on cross-modal feature fusion | |
CN111861886A (en) | Image super-resolution reconstruction method based on multi-scale feedback network | |
CN116168197A (en) | Image segmentation method based on Transformer segmentation network and regularization training | |
CN109949217A (en) | Video super-resolution method for reconstructing based on residual error study and implicit motion compensation | |
CN117132500A (en) | Weak light enhancement method based on sparse conversion network | |
Wang et al. | Compensation Atmospheric Scattering Model and Two-Branch Network for Single Image Dehazing | |
CN114936977A (en) | Image deblurring method based on channel attention and cross-scale feature fusion | |
CN117611484B (en) | Image denoising method and system based on denoising self-decoding network | |
CN115439849B (en) | Instrument digital identification method and system based on dynamic multi-strategy GAN network | |
CN117095217A (en) | Multi-stage comparative knowledge distillation process | |
CN115100107B (en) | Method and system for dividing skin mirror image | |
CN112927159B (en) | True image denoising method based on multi-scale selection feedback network | |
CN116579940A (en) | Real-time low-illumination image enhancement method based on convolutional neural network | |
CN113240589A (en) | Image defogging method and system based on multi-scale feature fusion | |
CN116266336A (en) | Video super-resolution reconstruction method, device, computing equipment and storage medium | |
CN112116062A (en) | Multilayer perceptron nonlinear compression method based on tensor string decomposition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |