WO2023066173A1 - Image processing method and apparatus, and storage medium and electronic device - Google Patents
Image processing method and apparatus, and storage medium and electronic device Download PDFInfo
- Publication number
- WO2023066173A1 WO2023066173A1 PCT/CN2022/125573 CN2022125573W WO2023066173A1 WO 2023066173 A1 WO2023066173 A1 WO 2023066173A1 CN 2022125573 W CN2022125573 W CN 2022125573W WO 2023066173 A1 WO2023066173 A1 WO 2023066173A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- shadow
- output
- neural network
- processed
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 41
- 238000013528 artificial neural network Methods 0.000 claims abstract description 71
- 238000012545 processing Methods 0.000 claims description 32
- 238000000034 method Methods 0.000 claims description 25
- 238000000605 extraction Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 15
- 239000002131 composite material Substances 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000001308 synthesis method Methods 0.000 claims description 9
- 230000003628 erosive effect Effects 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims description 2
- 230000010339 dilation Effects 0.000 claims 2
- 238000009499 grossing Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 14
- 238000012549 training Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000007704 transition Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000008030 elimination Effects 0.000 description 4
- 238000003379 elimination reaction Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000007797 corrosion Effects 0.000 description 2
- 238000005260 corrosion Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- G06T5/94—Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present application relates to image processing technologies, and in particular, to an image processing method, device, storage medium, and electronic equipment.
- the current shadow removal method either cannot remove the shadow completely, or loses the information of the background layer, or runs slowly, which is not conducive to the use of ordinary users.
- An existing shadow removal method uses a neural network consisting of three modules, namely a global localization module, an appearance modeling module, and a semantic modeling module.
- the global positioning module is responsible for detecting the shadow area and obtaining the location features of the shadow area;
- the appearance modeling module is used to learn the characteristics of the non-shaded area, so that the output of the network is consistent with the labeled data (Ground Truth, GT) in the non-shaded area;
- a semantic modeling module is used to restore the original content behind shadows.
- this method does not directly output the background image after the shadow is removed, but the ratio of the shadow image to the background image. It needs to be further divided by the shadow image and the network output pixel by pixel to obtain the background image, which introduces a greater amount of calculation. At the same time Division may affect calculation stability due to the problem of division by 0.
- Embodiments of the present application provide an image processing method, device, storage medium, and electronic equipment to at least solve the technical problems in the prior art that it is easy to eliminate shadow areas while causing side effects on the image background layer and has high requirements for hardware platforms.
- an image processing method including: acquiring an image to be processed that includes a shadow area; inputting the image to be processed to a trained neural network to obtain a shadow-removed image; wherein, the neural network includes Two-level cascaded first-level network and second-level network, the first-level network receives the image to be processed and outputs the mask map of the shadow area, and the second-level network receives the image to be processed and the mask map of the shadow area at the same time, and outputs the image to be processed shadow image.
- the first-level network includes: a first feature extraction module, including a first encoder, for extracting the features of the image to be processed layer by layer, and obtaining the first set of feature data; the shadow area estimation module, and the first feature extraction
- the output connection of the module includes a first decoder for estimating the shadow area based on the first set of feature data and outputting a mask map of the shadow area.
- the second-level network includes: a second feature extraction module, including a second encoder, connected to the output of the first-level network, and receiving the shadow area mask map output by the first-level network while receiving the image to be processed , used to obtain the second set of feature data; the result map output module, connected to the output of the second feature extraction module, includes a second decoder, used to output the shadowed image based on the second set of feature data.
- a second feature extraction module including a second encoder, connected to the output of the first-level network, and receiving the shadow area mask map output by the first-level network while receiving the image to be processed , used to obtain the second set of feature data
- the result map output module connected to the output of the second feature extraction module, includes a second decoder, used to output the shadowed image based on the second set of feature data.
- each layer of the first decoder or the second decoder is spliced along the channel axis with the output of the corresponding layer of the first encoder or the second encoder through a cross-layer connection.
- a multi-scale pyramid pooling module is added to the cross-layer connection of the decoder and the first encoder or the second encoder, and the multi-scale pyramid pooling module fuses features of different scales.
- the image processing method further includes: using an image pyramid algorithm to downsample the image to be processed, and saving the gradient information of all levels of layers while downsampling to form a Lapla Laplacian pyramid; feed the smallest layer into the trained neural network to obtain the output image; use the Laplacian pyramid to reconstruct the output image from low resolution to high resolution to obtain the shadowed image.
- the above-mentioned image processing method further includes: constructing an initial neural network; using sample data to train the initial neural network to obtain a trained neural network, wherein the sample data includes a real shot image and a synthetic shadow image, and the synthetic shadow image Composite with pure shadow and no shadow maps using image compositing methods.
- using the image synthesis method to synthesize the above composite shadow image with the pure shadow image and the no shadow image includes: obtaining the pure shadow image; obtaining the no shadow image; and obtaining the composite shadow image based on the pure shadow image and the no shadow image.
- using the image synthesis method to synthesize the above composite shadow image with the pure shadow image and the no shadow image further includes: transforming the pure shadow image, and obtaining a composite shadow image based on the transformed pure shadow image and the no shadow image, wherein, The pixel values of the non-shaded areas in the transformed pure shadow image are uniformly set to a fixed value a, and the pixel values of the shadowed areas are values between 0 and a, where a is a positive integer.
- the initial neural network also includes a module for classifying the sample data.
- the marked data is a shadow-removed image collected in real scene, and according to the output of the initial neural network
- the difference between the shadow-removed image and the shadow-removed image as labeled data adjusts the parameters inside the second-level network;
- the labeled data includes the unshaded image and Pure shadow map, adjust the parameters inside the first-level network according to the difference between the mask map of the shadow area and the pure shadow map, and adjust the internal parameters of the second-level network according to the difference between the unshaded image and the unshaded image output by the initial neural network parameters.
- the loss function includes at least one of the following: pixel loss, feature loss, structural similarity loss, confrontation loss, shadow edge loss, and shadow brightness loss.
- the pixel loss includes a pixel truncation loss.
- the loss of two pixels is calculated; when the output of the initial neural network
- the absolute difference between the corresponding two pixels in the image and the label image is not greater than a given threshold, the difference between the two pixels is ignored.
- the shadow brightness is lost, so that the brightness difference between the brightness of the area corresponding to the shadow area in the shadow removal image output by the neural network and the shadow area in the input image to be processed is greater than 0, which is used to improve the shadow removal in the shadow image.
- the brightness of the area corresponding to the shaded area is used to improve the shadow removal in the shadow image.
- the above image processing method includes: performing expansion processing on the shadow area mask image to obtain an expansion image; performing erosion processing on the shadow area mask image to obtain an erosion image; obtaining an expansion image and the difference of the erosion map as the shaded and unshaded boundary regions, and smoothed using TVLoss
- an image processing device including: an image acquisition unit, used to acquire an image to be processed including a shadow area; a processing unit, used to receive the image to be processed, and use the trained The neural network of the to-be-processed image is processed to obtain the shadow-removed image; wherein, the neural network includes a two-level cascaded first-level network and a second-level network, and the first-level network receives the image to be processed and outputs a shadow area mask map, The second-level network simultaneously receives the image to be processed and the shadow area mask map, and outputs a deshaded image.
- the first-level network includes: a first feature extraction module, including a first encoder, for extracting the features of the image to be processed layer by layer, and obtaining the first set of feature data; the shadow area estimation module, and the first feature extraction
- the output connection of the module includes a first decoder for estimating the shadow area based on the first set of feature data and outputting a mask map of the shadow area.
- the second-level network includes: a second feature extraction module, including a second encoder, connected to the output of the first-level network, and receiving the shadow area mask map output by the first-level network while receiving the image to be processed , used to obtain the second set of feature data; the result map output module, connected to the output of the second feature extraction module, includes a second decoder, used to output the shadowed image based on the second set of feature data.
- a second feature extraction module including a second encoder, connected to the output of the first-level network, and receiving the shadow area mask map output by the first-level network while receiving the image to be processed , used to obtain the second set of feature data
- the result map output module connected to the output of the second feature extraction module, includes a second decoder, used to output the shadowed image based on the second set of feature data.
- a storage medium including a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to execute any one of the image processing methods described above.
- an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein, the processor is configured to execute the Instructions can be executed to execute the image processing method described in any one of the above.
- This application proposes a fast and effective shadow elimination method applicable to mobile terminals such as mobile phones, which captures the characteristics of the physical phenomenon of shadows, synthesizes training materials with a strong sense of reality, and combines a variety of different loss functions Training with effective network structure and modules to achieve better shadow elimination.
- this application uses down-sampling technology and network pruning technology. The graph can still achieve very fast processing speed.
- Fig. 1 is a flow chart of an optional image processing method according to an embodiment of the present application
- FIG. 2 is a structural diagram of an optional neural network according to an embodiment of the present application.
- FIG. 3 is a flowchart of an optional training neural network according to an embodiment of the present application.
- FIG. 4 is a flow chart of an optional image synthesis method according to an embodiment of the present application.
- Fig. 5 (a) and Fig. 5 (b) are the comparison diagrams of the effect of removing shadows by using the image processing method of the embodiment of the present application;
- Fig. 6 is a structural block diagram of an optional image processing apparatus according to an embodiment of the present application.
- FIG. 1 it is a flowchart of an optional image processing method according to an embodiment of the present application. As shown in Figure 1, the image processing method includes the following steps:
- the neural network includes a two-stage cascaded first-level network and a second-level network, and the first-level network receives the image to be processed and outputs a shadow Area mask map, the second-level network receives the image to be processed and the shadow area mask map at the same time, and outputs the shadowed image.
- the neural network includes a two-stage cascaded first-level network 20 and a second-level network 22, and the first-level network includes a first feature extraction module 200 and a shaded area
- the estimation module 202 the second-level network includes a second feature extraction module 204 and a result map output module 206 .
- the first feature extraction module 200 includes a first encoder for extracting the features of the image to be processed layer by layer to obtain the first set of feature data;
- the shadow area estimation module 202 is connected to the output of the first feature extraction module 200, Including a first decoder for estimating the shadow area based on the first set of feature data and outputting a mask map for the shadow area;
- the second feature extraction module 204 includes a second encoder connected to the output of the first-level network, receiving the While processing the image, receive the shaded area mask map output by the first-level network to obtain the second set of feature data;
- the result map output module 206 is connected to the output of the second feature extraction module 204, includes a second decoder, and uses and outputting a deshaded image based on the second set of feature data.
- the first-level network and the second-level network have the same structure except for the number of input channels. For example, they can be constructed based on the classic segmentation network UNet.
- the outputs of each layer of the two encoders are respectively concatenated with the outputs of the corresponding layers of the two decoders along the channel axis through cross-layer connections.
- the multi-scale pyramid pooling module includes multiple pooling layers of different kernel sizes, convolutional layers, and interpolation upsampling layers. First, features of different scales are extracted through the pooling layer, and then low-level and/or high-level features are extracted through the convolutional layer. Then the output of the corresponding layer of the encoder and decoder is adjusted to the same size through the interpolation upsampling layer, and finally stitched into a feature along the channel axis.
- the multi-scale pyramid pooling module integrates features of different scales, which enhances the generalization of the network and enables the network to achieve better results on shadow maps of different areas and degrees.
- the model can be pruned, and the convolutional layer in the encoder is replaced by grouped convolution.
- Each convolution kernel only convolves one channel, thereby reducing the amount of calculation of the model. , to increase processing speed.
- an instance regularization layer is added after the convolutional layers of the encoder and decoder to regularize the features, thereby improving the shadow removal effect.
- the image pyramid algorithm can be used to downsample the image to be processed first, and the gradient information of all levels of layers can be saved while downsampling Form the Laplacian pyramid, and then send the layer with the smallest pyramid size to the trained neural network to obtain the output image; finally, use the Laplacian pyramid to reconstruct the output image, because the gradient information in the shadow area is weak, Therefore, even if the reconstruction process restores some gradient information of the image to be processed, it will not affect the shadow removal effect.
- the image is reconstructed by using the gradient information of all levels of layers saved during downsampling, so as to eliminate shadows without affecting the image resolution.
- the image processing method also includes:
- S302 Using sample data to train the initial neural network to obtain a trained neural network, wherein the sample data includes a real-shot image and a synthetic shadow image, and the synthetic shadow image is synthesized from a pure shadow image and a no-shadow image.
- the sample data used to train the initial neural network plays a vital role in the whole image processing method, and there are mainly two methods for obtaining sample data: real scene acquisition and image synthesis.
- the acquisition personnel select the corresponding light environment and shooting objects according to the scene category (for example, different lighting scenes, warm light, cold light, daylight, etc.), fix the mobile phone or camera and other shooting devices with a tripod, adjust Appropriate light direction and focal length, using palms, mobile phones or other common objects as occluders for shading, forming shadows on the subject and shooting to obtain a shadow image, and then removing the occluder and shooting again to obtain a shadow-free background image, thus obtaining paired sample data.
- the scene category for example, different lighting scenes, warm light, cold light, daylight, etc.
- image synthesis methods can be used to generate realistic synthetic shadow maps for the training of neural networks.
- the image synthesis method includes:
- the data collector lays a piece of white paper on the desktop under the preset light environment, uses palms, mobile phones or other common objects to block the light, and leaves a pure shadow image on the white paper S, where all or part of the pure shadow map S is a shadow area;
- a is a positive integer.
- the data collectors take the shadow-free images B of various objects in the above-mentioned same light environment
- the pure shadow map S (or transformed pure shadow map S′) is multiplied pixel by pixel by the non-shade map B to obtain a composite shadow map.
- This image synthesis method takes into account the weakening effect of shadows on light, and can better handle shadows with gentle edge transitions, and has a strong sense of reality.
- the initial neural network also includes a module for classifying the sample data.
- Truth, GT is the shadow removal image collected in the real scene. Since the shadow area mask map of the real shot image cannot be adjusted, it can be based on the difference between the shadow removal image output by the initial neural network and the shadow removal image as the labeled data GT.
- the label data (Ground Truth, GT) includes the unshaded image and the pure shadow image collected in real scene, according to the shadow area mask
- the difference between the model image and the pure shadow image adjusts the parameters inside the first-level network 20, and adjusts the parameters inside the second-level network 22 according to the difference between the shadow-removed image output by the initial neural network and the unshaded image as labeled data .
- the sample data acquisition method may also include one or more processes such as random flipping, rotation, color temperature adjustment, channel exchange, and adding random noise to the acquired sample data, so that the sample data is more accurate. For enrichment, increase the robustness of the network.
- the loss function when performing supervised training on the initial neural network, includes at least one of the following: pixel loss, feature loss, structural similarity loss, and adversarial loss.
- the pixel loss function is a function to measure the similarity of two images from the pixel level of the image, mainly including image pixel value loss and gradient loss. In this embodiment, it mainly refers to the weighted sum of the pixel value mean square error of the comparison between the output image of the initial neural network and the label image and the L1 norm error of the gradient of the two images.
- the pixel loss supervises the training process from the pixel level, so that the pixel value of each pixel of the output image of the initial neural network and the label image is as close as possible.
- a pixel truncation loss can be introduced to truncate the pixel loss, namely When the absolute difference between two pixels is greater than a given threshold, the loss of two pixels is calculated, otherwise the difference between two pixels is ignored. After adding the pixel truncation loss, it can guide the network to pay attention to the shadow area and suppress the noise of the image. Not only the effect of shadow removal is enhanced, but the convergence speed of the network is also greatly accelerated.
- the feature loss mainly refers to the weighted sum of the L1 norm error of the input image of the initial neural network and the corresponding features of the label image.
- the VGG19 network pre-trained on the ImageNet data set is used as a feature extractor, and the output image and label image of the initial neural network are respectively sent to the feature extractor to obtain the features of each layer of VGG19 Then calculate the L1 norm error of the corresponding features of the input image and the label image and weight the summation.
- the features of each layer of VGG19 are not sensitive to image details and noise, and have good semantic characteristics. Therefore, even if the input image and output image have defects such as noise or misalignment, feature loss can still accurately generate effective differences in shadow areas. It makes up for the lack of sensitivity of pixel loss to noise and has good stability.
- the structural similarity loss function is a function to measure the similarity of two images according to the global features of the images. In this embodiment, it mainly refers to the global difference in brightness and contrast between the output image of the initial neural network and the label image. Adding this loss function can effectively suppress the color cast of the network output and improve the overall quality of the image.
- Adversarial loss mainly refers to the output of the discriminator and the loss value of the true category of the output image.
- Adversarial loss mainly refers to the output of the discriminator and the loss value of the true category of the output image.
- the effects of pixel loss, feature loss, and structural similarity loss will gradually become smaller, and the network convergence will slow down.
- a discriminator network is trained synchronously for the training of the auxiliary network.
- the output image and label image of the initial neural network are sent to the discriminator, and the discriminator judges whether the output image is a label image, calculates the loss and updates the discriminator parameters according to the output result of the discriminator and the true category of the output image; then The discrimination result of the discriminator on the output image is taken as the loss of the authenticity of the output image, and the parameters of the discriminator are updated with the loss.
- Training ends when the discriminator cannot distinguish between the output image of the initial neural network and the label image.
- the adversarial loss can effectively eliminate the image side effects caused by network processing (for example, the problem of color inconsistency between shadow and non-shadow area, shadow residual problem, etc.), and improve the realism of the network output image.
- Threshold truncation loss Due to the influence of lighting, the paired data collected in the real scene may also have slight brightness differences and color changes in non-shaded areas, and these differences are acceptable to users and do not need to be processed. Therefore, in the training process, in order to prevent the network's attention from focusing on these global small differences, the method introduces a threshold truncation loss, that is, only when the difference between the output of the network and GT is greater than a given threshold, the difference is aggregated. Include the gradient of the overall loss calculation parameters, otherwise the loss is considered to be 0. This loss function tolerates the slight difference between the output of the network and GT, and shifts the focus of network learning to areas with large differences, thus effectively improving the network's ability to eliminate obvious shadows.
- Shadow edge loss First, expand the mask image of the shadow area to obtain an expansion map; secondly, perform erosion processing on the mask image of the shadow area to obtain a corrosion map; then, obtain the difference between the expansion map and the corrosion map as the result of obtaining shadow and non-shadow Boundary areas, and smoothed with TVLoss, can effectively transition between shadow and non-shadow areas.
- Shadow brightness loss so that the brightness difference between the brightness of the area corresponding to the shadow area in the shadow removal map output by the neural network and the shadow area in the input image to be processed is greater than 0, which is used to improve the shadow area corresponding to the shadow area in the shadow removal image.
- the brightness of the area is used to improve the shadow area corresponding to the shadow area in the shadow removal image.
- the output module of the background layer of the initial neural network uses the weighted sum of all the above losses as the total loss, and uses the Wassertein generation confrontation network as the confrontation loss.
- the network structure extracts the global features and local features of the input image, improves the degree of shadow elimination, and protects non-shadow areas from side effects.
- Fig. 5(a) and Fig. 5(b) are the comparison diagrams of the processing effects realized by the image processing method of the embodiment of the present application, wherein Fig. 5(a) is an image to be processed containing shadows, and Fig. 5(b) is the processed image after The shadow-removed image processed by the image processing method can be seen from the comparison of the two images.
- the image processing method provided by this application can effectively eliminate the shadow without causing significant side effects on the background layer.
- the neural network structure and loss function used in the embodiment of the present application can also be applied in application scenarios such as removing shadows, removing rain and fog, and is mainly used to process high-resolution images taken by mobile terminals such as mobile phones, but it is also applicable to PC or Handle images of various resolutions in other embedded devices.
- an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein, the processor is configured to execute the above-mentioned Any image processing method.
- the storage medium includes a stored program, wherein when the program is running, the device where the storage medium is located is controlled to execute any one of the above image processing methods.
- an image processing device is also provided.
- Fig. 6 it is a structural block diagram of an optional image processing device according to an embodiment of the present application.
- the image processing device 60 includes an image acquisition unit 600 and a processing unit 602 .
- the image acquisition unit 600 is configured to acquire the image to be processed including the shaded area.
- the processing unit 602 is configured to receive an image to be processed, and use a trained neural network to process the image to be processed to obtain a shadow-removed image, wherein the neural network includes a two-stage cascaded first-level network and a second-level network, to be The processed image and the output image of the first-level network are simultaneously input to the second-level network.
- the structure of the neural network is shown in FIG. 2 and related descriptions herein, and will not be repeated here.
- the disclosed technical content can be realized in other ways.
- the device embodiments described above are only illustrative.
- the division of the units may be a logical function division.
- multiple units or components may be combined or may be Integrate into another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of units or modules may be in electrical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
- the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for enabling a computer device (which may be a personal computer, server or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disc, etc., which can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (18)
- 一种图像处理方法,包括:An image processing method, comprising:获取包含阴影区域的待处理图像;Get the image to be processed that contains the shaded area;将所述待处理图像输入至经过训练的神经网络,获得去阴影图像;其中,所述神经网络包含两级级联的第一级网络和第二级网络,所述第一级网络接收所述待处理图像并输出阴影区域掩模图,所述第二级网络同时接收所述待处理图像和所述阴影区域掩模图,并输出所述去阴影图像。The image to be processed is input to a trained neural network to obtain a shadow-removed image; wherein, the neural network includes a two-stage cascaded first-level network and a second-level network, and the first-level network receives the The to-be-processed image outputs a shadow area mask map, and the second-level network simultaneously receives the to-be-processed image and the shadow area mask map, and outputs the shadow-removed image.
- 根据权利要求1所述的图像处理方法,其特征在于,所述第一级网络包括:The image processing method according to claim 1, wherein the first-level network comprises:第一特征提取模块,包含第一编码器,用于逐层提取所述待处理图像的特征,获得第一组特征数据;The first feature extraction module includes a first encoder for extracting features of the image to be processed layer by layer to obtain a first set of feature data;阴影区域估计模块,与所述第一特征提取模块的输出连接,包含第一解码器,用于基于所述第一组特征数据估计阴影区域并输出阴影区域掩模图。The shadow area estimation module is connected to the output of the first feature extraction module and includes a first decoder for estimating a shadow area based on the first set of feature data and outputting a shadow area mask map.
- 根据权利要求1所述的图像处理方法,其特征在于,所述第二级网络包括:The image processing method according to claim 1, wherein the second-level network comprises:第二特征提取模块,包含第二编码器,与所述第一级网络的输出连接,在接收待处理图像的同时接收所述第一级网络输出的阴影区域掩模图,用于获得第二组特征数据;The second feature extraction module, including a second encoder, is connected to the output of the first-level network, receives the shadow area mask map output by the first-level network while receiving the image to be processed, and is used to obtain the second group feature data;结果图输出模块,与所述第二特征提取模块的输出相连,包含第二解码器,用于基于所述第二组特征数据输出所述去阴影图像。The result image output module is connected to the output of the second feature extraction module, and includes a second decoder, configured to output the shadow-removed image based on the second set of feature data.
- 根据权利要求2或3所述的图像处理方法,其特征在于,所述第一解码器或所述第二解码器各层的输出通过跨层连接与所述第一编码器或所述第二编码器对应层的输出沿着通道轴进行拼接,在所述第一解码器或所述第二解码器以及所述第一编码器或所述第二编码器的跨层连接上添加多尺度金字塔池化模块,所述多尺度金字塔池化模块将不同尺度的特征进行融合。The image processing method according to claim 2 or 3, wherein the output of each layer of the first decoder or the second decoder is connected to the first encoder or the second decoder through a cross-layer connection. The output of the corresponding layer of the encoder is concatenated along the channel axis, and a multi-scale pyramid is added on the cross-layer connection of the first decoder or the second decoder and the first encoder or the second encoder A pooling module, the multi-scale pyramid pooling module fuses features of different scales.
- 根据权利要求1所述的图像处理方法,其特征在于,在获取包含阴影区域的待处理图像之后,所述图像处理方法还包括:The image processing method according to claim 1, characterized in that, after obtaining the image to be processed including the shadow area, the image processing method further comprises:采用图像金字塔算法对所述待处理图像进行降采样,并且在降采样的同时保存各级图层的梯度信息形成拉普拉斯金字塔;Using an image pyramid algorithm to down-sample the image to be processed, and save the gradient information of all levels of layers while down-sampling to form a Laplacian pyramid;将尺寸最小的图层送入经过训练的神经网络,获得输出图像;Feed the layer with the smallest size into the trained neural network to obtain the output image;使用拉普拉斯金字塔对所述输出图像进行低分辨率到高分辨率的重建,获得所述去 阴影图像。Use the Laplacian pyramid to carry out the reconstruction from low resolution to high resolution to the output image to obtain the shadow removal image.
- 根据权利要求1所述的图像处理方法,还包括:The image processing method according to claim 1, further comprising:构建初始神经网络;Build the initial neural network;使用样本数据对所述初始神经网络进行训练,获得所述经过训练的神经网络,其中,所述样本数据包括实拍图和合成阴影图,所述合成阴影图使用图像合成方法用纯阴影图和无阴影图合成。Using sample data to train the initial neural network to obtain the trained neural network, wherein the sample data includes a real shot image and a synthetic shadow image, and the synthetic shadow image uses an image synthesis method using a pure shadow image and No shadow map compositing.
- 根据权利要求1所述的图像处理方法,其特征在于,使用图像合成方法用纯阴影图和无阴影图合成所述合成阴影图包括:The image processing method according to claim 1, wherein using an image synthesis method to synthesize the composite shadow image with a pure shadow image and a shadow-free image comprises:获取纯阴影图;Get a pure shadow map;获取无阴影图;Get the unshaded map;基于所述纯阴影图和所述无阴影图,获得所述合成阴影图。Based on the pure shadow map and the no shadow map, the composite shadow map is obtained.
- 根据权利要求7所述的图像处理方法,其特征在于,使用图像合成方法用纯阴影图和无阴影图合成所述合成阴影图还包括:对所述纯阴影图进行变换,基于经过变换的纯阴影图与所述无阴影图,获得所述合成阴影图,其中,所述经过变换的纯阴影图中非阴影区域的像素值统一设置为一个固定数值a,阴影区域的像素值则为0~a之间的数值,a为正整数。The image processing method according to claim 7, wherein using an image synthesis method to synthesize the composite shadow image with a pure shadow image and a non-shadow image further comprises: transforming the pure shadow image, based on the transformed pure shadow image The shadow image and the unshaded image are used to obtain the composite shadow image, wherein the pixel values in the non-shaded area of the transformed pure shadow image are uniformly set to a fixed value a, and the pixel values in the shadow area are 0- A value between a, a is a positive integer.
- 根据权利要求7所述的图像处理方法,其特征在于,所述初始神经网络还包括对样本数据进行类别判断的模块,当判断出输入所述初始神经网络的样本数据为实拍图时,标注数据为实景采集的去阴影图像,根据所述初始神经网络输出的所述去阴影图像和作为所述标注数据的所述去阴影图像之间的差异调整所述第二级网络内部的参数;当判断出输入所述初始神经网络的样本数据为合成阴影图时,所述标注数据包括实景采集的所述无阴影图像和所述纯阴影图,根据所述阴影区域掩模图和所述纯阴影图之间的差异调整第一级网络内部的参数,根据所述初始神经网络输出的去阴影图像和所述无阴影图像之间的差异调整第二级网络内部的参数。The image processing method according to claim 7, wherein the initial neural network further comprises a module for classifying sample data, and when it is judged that the sample data input into the initial neural network is a real shot, mark The data is a shadow-removed image collected in the real scene, and the parameters inside the second-level network are adjusted according to the difference between the shadow-removed image output by the initial neural network and the shadow-removed image as the label data; when When it is judged that the sample data input to the initial neural network is a synthetic shadow image, the label data includes the shadow-free image and the pure shadow image collected in real scene, according to the shadow area mask image and the pure shadow The difference between the graphs adjusts the internal parameters of the first-level network, and adjusts the internal parameters of the second-level network according to the difference between the shadow-removed image output by the initial neural network and the shadow-free image.
- 根据权利要求6所述的图像处理方法,其特征在于,使用样本数据对所述初始神经网络进行训练时,损失函数包含以下至少一项:像素损失、特征损失、结构相似性损失、对抗损失、阴影边缘损失、阴影亮度损失。The image processing method according to claim 6, wherein when using sample data to train the initial neural network, the loss function includes at least one of the following: pixel loss, feature loss, structural similarity loss, adversarial loss, Shadow edge loss, shadow brightness loss.
- 根据权利要求10所述的图像处理方法,其特征在于,所述像素损失包含像素截断损失,当所述初始神经网络的输出图像和标签图像中对应的两个像素的绝对差值大于给定阈值时,计算所述两个像素的损失;当所述初始神经网络的输出图像和所述标签图像中对应 的两个像素的绝对差值不大于所述给定阈值时,忽略所述两个像素的差异。The image processing method according to claim 10, wherein the pixel loss includes a pixel truncation loss, when the absolute difference between the two corresponding pixels in the output image of the initial neural network and the label image is greater than a given threshold , calculate the loss of the two pixels; when the absolute difference between the output image of the initial neural network and the corresponding two pixels in the label image is not greater than the given threshold, ignore the two pixels difference.
- 根据权利要求10所述的图像处理方法,其特征在于,所述阴影亮度损失,使得所述神经网络输出的所述去阴影图中与所述阴影区域对应的区域的亮度与输入的所述待处理图像中的所述阴影区域的亮度差值大于0,用于提升所述去阴影图像中与所述阴影区域对应的区域的亮度。The image processing method according to claim 10, wherein the shadow brightness loss is such that the brightness of the region corresponding to the shadow region in the shadow removal map output by the neural network is the same as the brightness of the region to be input. The brightness difference of the shadow area in the processed image is greater than 0, and is used to increase the brightness of the area corresponding to the shadow area in the shadow removal image.
- 根据权利要求10所述的图像处理方法,其特征在于,当所述损失函数包括所述阴影边缘损失时,所述图像处理方法包括:对所述阴影区域掩模图做膨胀处理,获得膨胀图;对所述阴影区域掩模图做腐蚀处理,获得腐蚀图;获取所述膨胀图和所述腐蚀图的差集作为阴影和非阴影的边界区域,并使用TVLoss进行平滑。The image processing method according to claim 10, wherein when the loss function includes the shadow edge loss, the image processing method comprises: performing dilation processing on the shadow area mask map to obtain an dilation map ; Perform erosion processing on the mask image of the shadow area to obtain an erosion image; obtain the difference set of the expansion image and the erosion image as the boundary area of shadow and non-shading, and use TVLoss to perform smoothing.
- 一种图像处理装置,包括:An image processing device, comprising:图像采集单元,用于获取包含阴影区域的待处理图像;An image acquisition unit, configured to acquire an image to be processed that includes a shaded area;处理单元,用于接收待处理图像,并使用经过训练的神经网络对待处理图像进行处理,获得去阴影图像;其中,所述神经网络包含两级级联的第一级网络和第二级网络,所述第一级网络接收所述待处理图像并输出阴影区域掩模图,所述第二级网络同时接收所述待处理图像和所述阴影区域掩模图,并输出所述去阴影图像。The processing unit is configured to receive the image to be processed, and process the image to be processed using a trained neural network to obtain a shadow-removed image; wherein, the neural network includes a two-stage cascaded first-level network and a second-level network, The first-level network receives the image to be processed and outputs a shadow area mask map, and the second-level network simultaneously receives the image to be processed and the shadow area mask map, and outputs the shadow-removed image.
- 根据权利要求14所述的图像处理装置,其特征在于,所述第一级网络包括:The image processing device according to claim 14, wherein the first-level network comprises:第一特征提取模块,包含第一编码器,用于逐层提取所述待处理图像的特征,获得第一组特征数据;The first feature extraction module includes a first encoder for extracting features of the image to be processed layer by layer to obtain a first set of feature data;阴影区域估计模块,与所述第一特征提取模块的输出连接,包含第一解码器,用于基于所述第一组特征数据估计阴影区域并输出阴影区域掩模图。The shadow area estimation module is connected to the output of the first feature extraction module and includes a first decoder for estimating a shadow area based on the first set of feature data and outputting a shadow area mask map.
- 根据权利要求14所述的图像处理装置,其特征在于,所述第二级网络包括:The image processing device according to claim 14, wherein the second level network comprises:第二特征提取模块,包含第二编码器,与所述第一级网络的输出连接,在接收待处理图像的同时接收所述第一级网络输出的阴影区域掩模图,用于获得第二组特征数据;The second feature extraction module, including a second encoder, is connected to the output of the first-level network, receives the shadow area mask map output by the first-level network while receiving the image to be processed, and is used to obtain the second group feature data;结果图输出模块,与所述第二特征提取模块的输出相连,包含第二解码器,用于基于所述第二组特征数据输出去阴影图像。The result image output module is connected to the output of the second feature extraction module, and includes a second decoder, configured to output a shadow-removed image based on the second set of feature data.
- 一种存储介质,其特征在于,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行权利要求1至13中任意一项所述的图像处理方法。A storage medium, characterized in that the storage medium includes a stored program, wherein when the program is running, the device where the storage medium is located is controlled to execute the image processing method according to any one of claims 1 to 13.
- 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:处理器;以及processor; and存储器,用于存储所述处理器的可执行指令;a memory for storing executable instructions of the processor;其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至13中任意一项所述的图像处理方法。Wherein, the processor is configured to execute the image processing method described in any one of claims 1 to 13 by executing the executable instructions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020247015956A KR20240089729A (en) | 2021-10-18 | 2022-10-17 | Image processing methods, devices, storage media and electronic devices |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111210502.3 | 2021-10-18 | ||
CN202111210502.3A CN116012232A (en) | 2021-10-18 | 2021-10-18 | Image processing method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023066173A1 true WO2023066173A1 (en) | 2023-04-27 |
Family
ID=86019717
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/125573 WO2023066173A1 (en) | 2021-10-18 | 2022-10-17 | Image processing method and apparatus, and storage medium and electronic device |
Country Status (3)
Country | Link |
---|---|
KR (1) | KR20240089729A (en) |
CN (1) | CN116012232A (en) |
WO (1) | WO2023066173A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310276A (en) * | 2023-05-24 | 2023-06-23 | 泉州装备制造研究所 | Target detection method, target detection device, electronic equipment and storage medium |
CN117726550A (en) * | 2024-02-18 | 2024-03-19 | 成都信息工程大学 | Multi-scale gating attention remote sensing image defogging method and system |
CN118521577A (en) * | 2024-07-22 | 2024-08-20 | 中建四局安装工程有限公司 | Control method and related equipment for intelligent production line of threaded connection type fire-fighting pipeline |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117575976B (en) * | 2024-01-12 | 2024-04-19 | 腾讯科技(深圳)有限公司 | Image shadow processing method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180012101A1 (en) * | 2016-07-08 | 2018-01-11 | Xerox Corporation | Shadow detection and removal in license plate images |
CN111626951A (en) * | 2020-05-20 | 2020-09-04 | 武汉科技大学 | Image shadow elimination method based on content perception information |
CN112819720A (en) * | 2021-02-02 | 2021-05-18 | Oppo广东移动通信有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
CN112991329A (en) * | 2021-04-16 | 2021-06-18 | 浙江指云信息技术有限公司 | Image shadow detection and elimination method based on GAN |
CN113222845A (en) * | 2021-05-17 | 2021-08-06 | 东南大学 | Portrait external shadow removing method based on convolution neural network |
-
2021
- 2021-10-18 CN CN202111210502.3A patent/CN116012232A/en active Pending
-
2022
- 2022-10-17 KR KR1020247015956A patent/KR20240089729A/en unknown
- 2022-10-17 WO PCT/CN2022/125573 patent/WO2023066173A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180012101A1 (en) * | 2016-07-08 | 2018-01-11 | Xerox Corporation | Shadow detection and removal in license plate images |
CN111626951A (en) * | 2020-05-20 | 2020-09-04 | 武汉科技大学 | Image shadow elimination method based on content perception information |
CN112819720A (en) * | 2021-02-02 | 2021-05-18 | Oppo广东移动通信有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
CN112991329A (en) * | 2021-04-16 | 2021-06-18 | 浙江指云信息技术有限公司 | Image shadow detection and elimination method based on GAN |
CN113222845A (en) * | 2021-05-17 | 2021-08-06 | 东南大学 | Portrait external shadow removing method based on convolution neural network |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310276A (en) * | 2023-05-24 | 2023-06-23 | 泉州装备制造研究所 | Target detection method, target detection device, electronic equipment and storage medium |
CN116310276B (en) * | 2023-05-24 | 2023-08-08 | 泉州装备制造研究所 | Target detection method, target detection device, electronic equipment and storage medium |
CN117726550A (en) * | 2024-02-18 | 2024-03-19 | 成都信息工程大学 | Multi-scale gating attention remote sensing image defogging method and system |
CN117726550B (en) * | 2024-02-18 | 2024-04-30 | 成都信息工程大学 | Multi-scale gating attention remote sensing image defogging method and system |
CN118521577A (en) * | 2024-07-22 | 2024-08-20 | 中建四局安装工程有限公司 | Control method and related equipment for intelligent production line of threaded connection type fire-fighting pipeline |
CN118521577B (en) * | 2024-07-22 | 2024-10-18 | 中建四局安装工程有限公司 | Control method and related equipment for intelligent production line of threaded connection type fire-fighting pipeline |
Also Published As
Publication number | Publication date |
---|---|
KR20240089729A (en) | 2024-06-20 |
CN116012232A (en) | 2023-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023066173A1 (en) | Image processing method and apparatus, and storage medium and electronic device | |
WO2022110638A1 (en) | Human image restoration method and apparatus, electronic device, storage medium and program product | |
Zhang et al. | Multi-scale single image dehazing using perceptual pyramid deep network | |
Wan et al. | CoRRN: Cooperative reflection removal network | |
CN108932693B (en) | Face editing and completing method and device based on face geometric information | |
WO2021103137A1 (en) | Indoor scene illumination estimation model, method and device, and storage medium and rendering method | |
Xie et al. | Joint super resolution and denoising from a single depth image | |
Li et al. | Single image snow removal via composition generative adversarial networks | |
CN111626951B (en) | Image shadow elimination method based on content perception information | |
WO2023212997A1 (en) | Knowledge distillation based neural network training method, device, and storage medium | |
CN115641391A (en) | Infrared image colorizing method based on dense residual error and double-flow attention | |
WO2023284401A1 (en) | Image beautification processing method and apparatus, storage medium, and electronic device | |
CN114723760B (en) | Portrait segmentation model training method and device and portrait segmentation method and device | |
CN109829925B (en) | Method for extracting clean foreground in matting task and model training method | |
Liu et al. | PD-GAN: perceptual-details gan for extremely noisy low light image enhancement | |
Guo et al. | Deep illumination-enhanced face super-resolution network for low-light images | |
KR102628115B1 (en) | Image processing method, device, storage medium, and electronic device | |
Zhao et al. | Detecting deepfake video by learning two-level features with two-stream convolutional neural network | |
CN111553856A (en) | Image defogging method based on depth estimation assistance | |
Xiao et al. | Image hazing algorithm based on generative adversarial networks | |
Gao et al. | Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization. | |
CN113781324A (en) | Old photo repairing method | |
CN116934972A (en) | Three-dimensional human body reconstruction method based on double-flow network | |
WO2023066099A1 (en) | Matting processing | |
CN117350928A (en) | Application of object-aware style transfer to digital images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22882779 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2024523517 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20247015956 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22882779 Country of ref document: EP Kind code of ref document: A1 |