WO2023066173A1

WO2023066173A1 - Image processing method and apparatus, and storage medium and electronic device

Info

Publication number: WO2023066173A1
Application number: PCT/CN2022/125573
Authority: WO
Inventors: 叶平; 张志伟; 鲍天龙
Original assignee: 虹软科技股份有限公司
Priority date: 2021-10-18
Filing date: 2022-10-17
Publication date: 2023-04-27
Also published as: KR20240089729A; CN116012232A

Abstract

Disclosed in the present application are an image processing method and apparatus, and a storage medium and an electronic device. The image processing method comprises: acquiring an image to be processed, which includes a shadow area; and inputting said image into a trained neural network, so as to obtain an image that has been subjected to shadow removal, wherein the neural network includes a first-stage network and a second-stage network, which are cascaded in two stages, the first-stage network receives the image to be processed and outputs a shadow area mask map, and the second-stage network simultaneously receives the image to be processed and the shadow area mask map and outputs the image that has been subjected to shadow removal. By means of the present application, the technical problems in the prior art of a side effect being easily generated on an image background layer during the removal of a shadow area, and having high requirements for a hardware platform can be solved.

Description

Image processing method, device, storage medium, and electronic equipment

This application claims the priority of the Chinese patent application No. 202111210502.3 submitted on October 18, 2021, and the content of the Chinese patent application is cited in its entirety as a part of this application.

technical field

The present application relates to image processing technologies, and in particular, to an image processing method, device, storage medium, and electronic equipment.

Background technique

When people use mobile phones to shoot documents, shadows are often left on the documents due to the occlusion of light by hands and mobile phones and the occlusion of light by other objects in the environment, which affects the visual experience of the captured images. Through computer vision processing technology Processing the image after shooting, eliminating shadows, and restoring the text and picture content behind the shadows can effectively improve the quality of the image. Therefore, document shadow removal is an important technology that can greatly improve the quality of the captured image , has broad market prospects.

Effectively eliminate the shadow layer without causing significant side effects on the background layer, and at the same time have a faster running speed and acceptable hardware configuration requirements, which are the basic requirements and main challenges for the shadow removal method to be applied to mobile phones. The current shadow removal method The method either cannot remove the shadow completely, or loses the information of the background layer, or runs slowly, which is not conducive to the use of ordinary users.

An existing shadow removal method uses a neural network consisting of three modules, namely a global localization module, an appearance modeling module, and a semantic modeling module. The global positioning module is responsible for detecting the shadow area and obtaining the location features of the shadow area; the appearance modeling module is used to learn the characteristics of the non-shaded area, so that the output of the network is consistent with the labeled data (Ground Truth, GT) in the non-shaded area; A semantic modeling module is used to restore the original content behind shadows. However, this method does not directly output the background image after the shadow is removed, but the ratio of the shadow image to the background image. It needs to be further divided by the shadow image and the network output pixel by pixel to obtain the background image, which introduces a greater amount of calculation. At the same time Division may affect calculation stability due to the problem of division by 0.

Therefore, it is necessary to propose an image processing technology that can effectively eliminate shadows without producing significant side effects on the background layer, and at the same time have a faster running speed and acceptable hardware configuration requirements.

Contents of the invention

Embodiments of the present application provide an image processing method, device, storage medium, and electronic equipment to at least solve the technical problems in the prior art that it is easy to eliminate shadow areas while causing side effects on the image background layer and has high requirements for hardware platforms.

According to an aspect of an embodiment of the present application, an image processing method is provided, including: acquiring an image to be processed that includes a shadow area; inputting the image to be processed to a trained neural network to obtain a shadow-removed image; wherein, the neural network includes Two-level cascaded first-level network and second-level network, the first-level network receives the image to be processed and outputs the mask map of the shadow area, and the second-level network receives the image to be processed and the mask map of the shadow area at the same time, and outputs the image to be processed shadow image.

Optionally, the first-level network includes: a first feature extraction module, including a first encoder, for extracting the features of the image to be processed layer by layer, and obtaining the first set of feature data; the shadow area estimation module, and the first feature extraction The output connection of the module includes a first decoder for estimating the shadow area based on the first set of feature data and outputting a mask map of the shadow area.

Optionally, the second-level network includes: a second feature extraction module, including a second encoder, connected to the output of the first-level network, and receiving the shadow area mask map output by the first-level network while receiving the image to be processed , used to obtain the second set of feature data; the result map output module, connected to the output of the second feature extraction module, includes a second decoder, used to output the shadowed image based on the second set of feature data.

Optionally, the output of each layer of the first decoder or the second decoder is spliced along the channel axis with the output of the corresponding layer of the first encoder or the second encoder through a cross-layer connection. A multi-scale pyramid pooling module is added to the cross-layer connection of the decoder and the first encoder or the second encoder, and the multi-scale pyramid pooling module fuses features of different scales.

Optionally, after acquiring the image to be processed including the shaded area, the image processing method further includes: using an image pyramid algorithm to downsample the image to be processed, and saving the gradient information of all levels of layers while downsampling to form a Lapla Laplacian pyramid; feed the smallest layer into the trained neural network to obtain the output image; use the Laplacian pyramid to reconstruct the output image from low resolution to high resolution to obtain the shadowed image.

Optionally, the above-mentioned image processing method further includes: constructing an initial neural network; using sample data to train the initial neural network to obtain a trained neural network, wherein the sample data includes a real shot image and a synthetic shadow image, and the synthetic shadow image Composite with pure shadow and no shadow maps using image compositing methods.

Optionally, using the image synthesis method to synthesize the above composite shadow image with the pure shadow image and the no shadow image includes: obtaining the pure shadow image; obtaining the no shadow image; and obtaining the composite shadow image based on the pure shadow image and the no shadow image.

Optionally, using the image synthesis method to synthesize the above composite shadow image with the pure shadow image and the no shadow image further includes: transforming the pure shadow image, and obtaining a composite shadow image based on the transformed pure shadow image and the no shadow image, wherein, The pixel values of the non-shaded areas in the transformed pure shadow image are uniformly set to a fixed value a, and the pixel values of the shadowed areas are values between 0 and a, where a is a positive integer.

Optionally, the initial neural network also includes a module for classifying the sample data. When it is judged that the sample data input to the initial neural network is a real picture, the marked data is a shadow-removed image collected in real scene, and according to the output of the initial neural network The difference between the shadow-removed image and the shadow-removed image as labeled data adjusts the parameters inside the second-level network; when it is judged that the sample data input to the initial neural network is a synthetic shadow image, the labeled data includes the unshaded image and Pure shadow map, adjust the parameters inside the first-level network according to the difference between the mask map of the shadow area and the pure shadow map, and adjust the internal parameters of the second-level network according to the difference between the unshaded image and the unshaded image output by the initial neural network parameters.

Optionally, when using sample data to train the initial neural network, the loss function includes at least one of the following: pixel loss, feature loss, structural similarity loss, confrontation loss, shadow edge loss, and shadow brightness loss.

Optionally, the pixel loss includes a pixel truncation loss. When the absolute difference between the corresponding two pixels in the output image of the initial neural network and the label image is greater than a given threshold, the loss of two pixels is calculated; when the output of the initial neural network When the absolute difference between the corresponding two pixels in the image and the label image is not greater than a given threshold, the difference between the two pixels is ignored.

Optionally, the shadow brightness is lost, so that the brightness difference between the brightness of the area corresponding to the shadow area in the shadow removal image output by the neural network and the shadow area in the input image to be processed is greater than 0, which is used to improve the shadow removal in the shadow image. The brightness of the area corresponding to the shaded area.

Optionally, when the loss function includes shadow edge loss, the above image processing method includes: performing expansion processing on the shadow area mask image to obtain an expansion image; performing erosion processing on the shadow area mask image to obtain an erosion image; obtaining an expansion image and the difference of the erosion map as the shaded and unshaded boundary regions, and smoothed using TVLoss

According to another aspect of the embodiment of the present application, there is also provided an image processing device, including: an image acquisition unit, used to acquire an image to be processed including a shadow area; a processing unit, used to receive the image to be processed, and use the trained The neural network of the to-be-processed image is processed to obtain the shadow-removed image; wherein, the neural network includes a two-level cascaded first-level network and a second-level network, and the first-level network receives the image to be processed and outputs a shadow area mask map, The second-level network simultaneously receives the image to be processed and the shadow area mask map, and outputs a deshaded image.

According to another aspect of the embodiments of the present application, there is also provided a storage medium, including a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to execute any one of the image processing methods described above.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein, the processor is configured to execute the Instructions can be executed to execute the image processing method described in any one of the above.

This application proposes a fast and effective shadow elimination method applicable to mobile terminals such as mobile phones, which captures the characteristics of the physical phenomenon of shadows, synthesizes training materials with a strong sense of reality, and combines a variety of different loss functions Training with effective network structure and modules to achieve better shadow elimination. In view of the high resolution of images captured by mobile terminals such as mobile phones, this application uses down-sampling technology and network pruning technology. The graph can still achieve very fast processing speed.

Description of drawings

The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The schematic embodiments and descriptions of the application are used to explain the application and do not constitute an improper limitation to the application. In the attached picture:

Fig. 1 is a flow chart of an optional image processing method according to an embodiment of the present application;

FIG. 2 is a structural diagram of an optional neural network according to an embodiment of the present application;

FIG. 3 is a flowchart of an optional training neural network according to an embodiment of the present application;

FIG. 4 is a flow chart of an optional image synthesis method according to an embodiment of the present application;

Fig. 5 (a) and Fig. 5 (b) are the comparison diagrams of the effect of removing shadows by using the image processing method of the embodiment of the present application;

Fig. 6 is a structural block diagram of an optional image processing apparatus according to an embodiment of the present application.

Detailed ways

In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the sequences so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

The following describes a flow chart of an optional image processing method in the embodiment of the present application. It should be noted that the steps shown in the flowcharts of the accompanying drawings may be implemented in a computer system, such as a set of computer-executable instructions, and, although a logical order is shown in the flowcharts, in some cases, the The steps shown or described are performed in an order different than here.

Referring to FIG. 1 , it is a flowchart of an optional image processing method according to an embodiment of the present application. As shown in Figure 1, the image processing method includes the following steps:

S100, acquire the image to be processed including the shaded area;

S102, inputting the image to be processed into the trained neural network to obtain a shadow-removed image; wherein, the neural network includes a two-stage cascaded first-level network and a second-level network, and the first-level network receives the image to be processed and outputs a shadow Area mask map, the second-level network receives the image to be processed and the shadow area mask map at the same time, and outputs the shadowed image.

Through the above-mentioned image processing method, accurate shadow area boundaries can be obtained, and the obtained shadow-removed image can smoothly transition between shadow and non-shadow.

In an optional embodiment, as shown in FIG. 2, the neural network includes a two-stage cascaded first-level network 20 and a second-level network 22, and the first-level network includes a first feature extraction module 200 and a shaded area The estimation module 202 , the second-level network includes a second feature extraction module 204 and a result map output module 206 . Wherein, the first feature extraction module 200 includes a first encoder for extracting the features of the image to be processed layer by layer to obtain the first set of feature data; the shadow area estimation module 202 is connected to the output of the first feature extraction module 200, Including a first decoder for estimating the shadow area based on the first set of feature data and outputting a mask map for the shadow area; the second feature extraction module 204 includes a second encoder connected to the output of the first-level network, receiving the While processing the image, receive the shaded area mask map output by the first-level network to obtain the second set of feature data; the result map output module 206 is connected to the output of the second feature extraction module 204, includes a second decoder, and uses and outputting a deshaded image based on the second set of feature data. Through a two-stage cascaded neural network, shadow removal can be enhanced. In an optional embodiment, the first-level network and the second-level network have the same structure except for the number of input channels. For example, they can be constructed based on the classic segmentation network UNet.

The outputs of each layer of the two encoders are respectively concatenated with the outputs of the corresponding layers of the two decoders along the channel axis through cross-layer connections. Add a multi-scale pyramid pooling module on the cross-layer connection of encoder and decoder. The multi-scale pyramid pooling module includes multiple pooling layers of different kernel sizes, convolutional layers, and interpolation upsampling layers. First, features of different scales are extracted through the pooling layer, and then low-level and/or high-level features are extracted through the convolutional layer. Then the output of the corresponding layer of the encoder and decoder is adjusted to the same size through the interpolation upsampling layer, and finally stitched into a feature along the channel axis. Since the influence degree and area of shadows are very different in different images, the determination of shadow areas should not only refer to local texture features, but also consider global semantic information. The multi-scale pyramid pooling module integrates features of different scales, which enhances the generalization of the network and enables the network to achieve better results on shadow maps of different areas and degrees.

In order to improve the running speed of the model on the device, the model can be pruned, and the convolutional layer in the encoder is replaced by grouped convolution. Each convolution kernel only convolves one channel, thereby reducing the amount of calculation of the model. , to increase processing speed.

In order to better suppress covariance drift and enhance the network's ability to fit data, an instance regularization layer is added after the convolutional layers of the encoder and decoder to regularize the features, thereby improving the shadow removal effect.

When the image resolution of the image to be processed is high or the amount of data is large, sending the image to be processed directly into the trained neural network will cause memory overflow or cause the processing time to be too long and affect the user experience. In order to solve this problem, you can The conventional interpolation scaling algorithm is used, but it is easy to cause the loss of image information, so that the generated image cannot be perfectly enlarged into the original image.

Considering that the shadow area usually has no significant gradient information, in an optional embodiment, the image pyramid algorithm can be used to downsample the image to be processed first, and the gradient information of all levels of layers can be saved while downsampling Form the Laplacian pyramid, and then send the layer with the smallest pyramid size to the trained neural network to obtain the output image; finally, use the Laplacian pyramid to reconstruct the output image, because the gradient information in the shadow area is weak, Therefore, even if the reconstruction process restores some gradient information of the image to be processed, it will not affect the shadow removal effect. The image is reconstructed by using the gradient information of all levels of layers saved during downsampling, so as to eliminate shadows without affecting the image resolution. By introducing down-sampling and image reconstruction, on the one hand, the speed of image processing is guaranteed, and on the other hand, the quality before and after image processing will not be affected, which is conducive to processing high-resolution images in devices with low computing power such as mobile phones.

As shown in Figure 3, in order to obtain a trained neural network, the image processing method also includes:

S300: Construct an initial neural network;

S302: Using sample data to train the initial neural network to obtain a trained neural network, wherein the sample data includes a real-shot image and a synthetic shadow image, and the synthetic shadow image is synthesized from a pure shadow image and a no-shadow image.

Since there are many types of shadows in the images that users often take, they can be distinguished from the edges of the shadows, including clear and sharp shadow edges when the light source is close to the background, and blurred and sharp shadow edges when the light source is far away from the background. Shadow edges with smooth transitions; in addition, when the light source presents different colors (such as reddish yellow warm light and bluish cool light and sunlight), the shadow will also appear different colors. Therefore, considering these characteristics, the sample data used to train the initial neural network plays a vital role in the whole image processing method, and there are mainly two methods for obtaining sample data: real scene acquisition and image synthesis.

In the method of real scene acquisition, the acquisition personnel select the corresponding light environment and shooting objects according to the scene category (for example, different lighting scenes, warm light, cold light, daylight, etc.), fix the mobile phone or camera and other shooting devices with a tripod, adjust Appropriate light direction and focal length, using palms, mobile phones or other common objects as occluders for shading, forming shadows on the subject and shooting to obtain a shadow image, and then removing the occluder and shooting again to obtain a shadow-free background image, thus obtaining paired sample data.

However, it is usually difficult to ensure the high quality of sample data in real scene acquisition. On the one hand, due to the light changes caused by occlusion, the background image and shadow image will have differences in brightness and color in non-shaded areas, and it is difficult to completely align the shadow image with the background image. ; On the other hand, due to light changes or focus changes, noise will be generated in shadow images and background images, which will have a greater impact on network training.

In this regard, image synthesis methods can be used to generate realistic synthetic shadow maps for the training of neural networks.

In an optional embodiment, the image synthesis method includes:

S400: Obtain a pure shadow map;

In an optional embodiment, the data collector lays a piece of white paper on the desktop under the preset light environment, uses palms, mobile phones or other common objects to block the light, and leaves a pure shadow image on the white paper S, where all or part of the pure shadow map S is a shadow area;

Since the non-shaded area on the white paper may not appear pure white when obtaining a pure shadow map, the boundary between the non-shaded area and the shaded area is not obvious enough. Therefore, in another optional embodiment, the pure shadow map can also be transformed, for example, S'=min(a,S/mean(S)*a), where a is a positive integer. Through the above transformation, the pixel values of the non-shaded areas in the transformed pure shadow map can be uniformly set to a fixed value a (for example, 255), and the pixel values of the shadow areas are values between 0 and a, so that the pure shadow map There is a clearer boundary between the shaded area of Central Africa and the shaded area.

S402: Obtain a shadow-free image;

In an optional embodiment, the data collectors take the shadow-free images B of various objects in the above-mentioned same light environment;

S404: Obtain a composite shadow image based on the pure shadow image and the no shadow image;

In an optional embodiment, the pure shadow map S (or transformed pure shadow map S′) is multiplied pixel by pixel by the non-shade map B to obtain a composite shadow map.

This image synthesis method takes into account the weakening effect of shadows on light, and can better handle shadows with gentle edge transitions, and has a strong sense of reality.

Since the sample data is a mixture of real-shot images and synthetic shadow images, the initial neural network also includes a module for classifying the sample data. Truth, GT) is the shadow removal image collected in the real scene. Since the shadow area mask map of the real shot image cannot be adjusted, it can be based on the difference between the shadow removal image output by the initial neural network and the shadow removal image as the labeled data GT. Adjust the parameters of the second-level network internal 22; when it is judged that the sample data input to the initial neural network is a synthetic shadow image, the label data (Ground Truth, GT) includes the unshaded image and the pure shadow image collected in real scene, according to the shadow area mask The difference between the model image and the pure shadow image adjusts the parameters inside the first-level network 20, and adjusts the parameters inside the second-level network 22 according to the difference between the shadow-removed image output by the initial neural network and the unshaded image as labeled data . By using mixed data as sample data for training, for shadows with gentle transitions, accurate masks can be obtained, the quality of mask segmentation can be guaranteed, and the effect of shadow elimination can be improved.

In an optional embodiment, the sample data acquisition method may also include one or more processes such as random flipping, rotation, color temperature adjustment, channel exchange, and adding random noise to the acquired sample data, so that the sample data is more accurate. For enrichment, increase the robustness of the network.

In an optional embodiment, when performing supervised training on the initial neural network, the loss function includes at least one of the following: pixel loss, feature loss, structural similarity loss, and adversarial loss.

The pixel loss function is a function to measure the similarity of two images from the pixel level of the image, mainly including image pixel value loss and gradient loss. In this embodiment, it mainly refers to the weighted sum of the pixel value mean square error of the comparison between the output image of the initial neural network and the label image and the L1 norm error of the gradient of the two images. The pixel loss supervises the training process from the pixel level, so that the pixel value of each pixel of the output image of the initial neural network and the label image is as close as possible. In order to guide the initial neural network to focus on the difference between the shadow layer and the background layer in the shadow area rather than the noise of the whole image, in an optional embodiment, a pixel truncation loss can be introduced to truncate the pixel loss, namely When the absolute difference between two pixels is greater than a given threshold, the loss of two pixels is calculated, otherwise the difference between two pixels is ignored. After adding the pixel truncation loss, it can guide the network to pay attention to the shadow area and suppress the noise of the image. Not only the effect of shadow removal is enhanced, but the convergence speed of the network is also greatly accelerated.

The feature loss mainly refers to the weighted sum of the L1 norm error of the input image of the initial neural network and the corresponding features of the label image. In an optional embodiment, the VGG19 network pre-trained on the ImageNet data set is used as a feature extractor, and the output image and label image of the initial neural network are respectively sent to the feature extractor to obtain the features of each layer of VGG19 Then calculate the L1 norm error of the corresponding features of the input image and the label image and weight the summation. The features of each layer of VGG19 are not sensitive to image details and noise, and have good semantic characteristics. Therefore, even if the input image and output image have defects such as noise or misalignment, feature loss can still accurately generate effective differences in shadow areas. It makes up for the lack of sensitivity of pixel loss to noise and has good stability.

The structural similarity loss function is a function to measure the similarity of two images according to the global features of the images. In this embodiment, it mainly refers to the global difference in brightness and contrast between the output image of the initial neural network and the label image. Adding this loss function can effectively suppress the color cast of the network output and improve the overall quality of the image.

Adversarial loss mainly refers to the output of the discriminator and the loss value of the true category of the output image. In the later stage of training, when the difference between the output image of the initial neural network and the label image becomes smaller, the effects of pixel loss, feature loss, and structural similarity loss will gradually become smaller, and the network convergence will slow down. At this time, a discriminator network is trained synchronously for the training of the auxiliary network. First, the output image and label image of the initial neural network are sent to the discriminator, and the discriminator judges whether the output image is a label image, calculates the loss and updates the discriminator parameters according to the output result of the discriminator and the true category of the output image; then The discrimination result of the discriminator on the output image is taken as the loss of the authenticity of the output image, and the parameters of the discriminator are updated with the loss. Training ends when the discriminator cannot distinguish between the output image of the initial neural network and the label image. The adversarial loss can effectively eliminate the image side effects caused by network processing (for example, the problem of color inconsistency between shadow and non-shadow area, shadow residual problem, etc.), and improve the realism of the network output image.

Threshold truncation loss. Due to the influence of lighting, the paired data collected in the real scene may also have slight brightness differences and color changes in non-shaded areas, and these differences are acceptable to users and do not need to be processed. Therefore, in the training process, in order to prevent the network's attention from focusing on these global small differences, the method introduces a threshold truncation loss, that is, only when the difference between the output of the network and GT is greater than a given threshold, the difference is aggregated. Include the gradient of the overall loss calculation parameters, otherwise the loss is considered to be 0. This loss function tolerates the slight difference between the output of the network and GT, and shifts the focus of network learning to areas with large differences, thus effectively improving the network's ability to eliminate obvious shadows.

Shadow edge loss. First, expand the mask image of the shadow area to obtain an expansion map; secondly, perform erosion processing on the mask image of the shadow area to obtain a corrosion map; then, obtain the difference between the expansion map and the corrosion map as the result of obtaining shadow and non-shadow Boundary areas, and smoothed with TVLoss, can effectively transition between shadow and non-shadow areas.

Shadow brightness loss, so that the brightness difference between the brightness of the area corresponding to the shadow area in the shadow removal map output by the neural network and the shadow area in the input image to be processed is greater than 0, which is used to improve the shadow area corresponding to the shadow area in the shadow removal image. The brightness of the area.

In an optional embodiment, the output module of the background layer of the initial neural network uses the weighted sum of all the above losses as the total loss, and uses the Wassertein generation confrontation network as the confrontation loss.

The network structure extracts the global features and local features of the input image, improves the degree of shadow elimination, and protects non-shadow areas from side effects.

Fig. 5(a) and Fig. 5(b) are the comparison diagrams of the processing effects realized by the image processing method of the embodiment of the present application, wherein Fig. 5(a) is an image to be processed containing shadows, and Fig. 5(b) is the processed image after The shadow-removed image processed by the image processing method can be seen from the comparison of the two images. The image processing method provided by this application can effectively eliminate the shadow without causing significant side effects on the background layer.

The neural network structure and loss function used in the embodiment of the present application can also be applied in application scenarios such as removing shadows, removing rain and fog, and is mainly used to process high-resolution images taken by mobile terminals such as mobile phones, but it is also applicable to PC or Handle images of various resolutions in other embedded devices.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein, the processor is configured to execute the above-mentioned Any image processing method.

According to another aspect of the embodiments of the present application, there is also provided a storage medium, the storage medium includes a stored program, wherein when the program is running, the device where the storage medium is located is controlled to execute any one of the above image processing methods.

According to another aspect of the embodiments of the present application, an image processing device is also provided. Referring to Fig. 6, it is a structural block diagram of an optional image processing device according to an embodiment of the present application. As shown in FIG. 6 , the image processing device 60 includes an image acquisition unit 600 and a processing unit 602 .

Each unit included in the image processing device 60 will be specifically described below.

The image acquisition unit 600 is configured to acquire the image to be processed including the shaded area.

The processing unit 602 is configured to receive an image to be processed, and use a trained neural network to process the image to be processed to obtain a shadow-removed image, wherein the neural network includes a two-stage cascaded first-level network and a second-level network, to be The processed image and the output image of the first-level network are simultaneously input to the second-level network.

In an optional embodiment, the structure of the neural network is shown in FIG. 2 and related descriptions herein, and will not be repeated here.

The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

In the above-mentioned embodiments of the present application, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed technical content can be realized in other ways. Wherein, the device embodiments described above are only illustrative. For example, the division of the units may be a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of units or modules may be in electrical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for enabling a computer device (which may be a personal computer, server or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disc, etc., which can store program codes. .

The above description is only the preferred embodiment of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present application, some improvements and modifications can also be made. These improvements and modifications are also It should be regarded as the protection scope of this application.

Claims

An image processing method, comprising:

Get the image to be processed that contains the shaded area;

The image to be processed is input to a trained neural network to obtain a shadow-removed image; wherein, the neural network includes a two-stage cascaded first-level network and a second-level network, and the first-level network receives the The to-be-processed image outputs a shadow area mask map, and the second-level network simultaneously receives the to-be-processed image and the shadow area mask map, and outputs the shadow-removed image.
The image processing method according to claim 1, wherein the first-level network comprises:

The first feature extraction module includes a first encoder for extracting features of the image to be processed layer by layer to obtain a first set of feature data;

The shadow area estimation module is connected to the output of the first feature extraction module and includes a first decoder for estimating a shadow area based on the first set of feature data and outputting a shadow area mask map.
The image processing method according to claim 1, wherein the second-level network comprises:

The second feature extraction module, including a second encoder, is connected to the output of the first-level network, receives the shadow area mask map output by the first-level network while receiving the image to be processed, and is used to obtain the second group feature data;

The result image output module is connected to the output of the second feature extraction module, and includes a second decoder, configured to output the shadow-removed image based on the second set of feature data.
The image processing method according to claim 2 or 3, wherein the output of each layer of the first decoder or the second decoder is connected to the first encoder or the second decoder through a cross-layer connection. The output of the corresponding layer of the encoder is concatenated along the channel axis, and a multi-scale pyramid is added on the cross-layer connection of the first decoder or the second decoder and the first encoder or the second encoder A pooling module, the multi-scale pyramid pooling module fuses features of different scales.
The image processing method according to claim 1, characterized in that, after obtaining the image to be processed including the shadow area, the image processing method further comprises:

Using an image pyramid algorithm to down-sample the image to be processed, and save the gradient information of all levels of layers while down-sampling to form a Laplacian pyramid;

Feed the layer with the smallest size into the trained neural network to obtain the output image;

Use the Laplacian pyramid to carry out the reconstruction from low resolution to high resolution to the output image to obtain the shadow removal image.
The image processing method according to claim 1, further comprising:

Build the initial neural network;

Using sample data to train the initial neural network to obtain the trained neural network, wherein the sample data includes a real shot image and a synthetic shadow image, and the synthetic shadow image uses an image synthesis method using a pure shadow image and No shadow map compositing.
The image processing method according to claim 1, wherein using an image synthesis method to synthesize the composite shadow image with a pure shadow image and a shadow-free image comprises:

Get a pure shadow map;

Get the unshaded map;

Based on the pure shadow map and the no shadow map, the composite shadow map is obtained.
The image processing method according to claim 7, wherein using an image synthesis method to synthesize the composite shadow image with a pure shadow image and a non-shadow image further comprises: transforming the pure shadow image, based on the transformed pure shadow image The shadow image and the unshaded image are used to obtain the composite shadow image, wherein the pixel values in the non-shaded area of the transformed pure shadow image are uniformly set to a fixed value a, and the pixel values in the shadow area are 0- A value between a, a is a positive integer.
The image processing method according to claim 7, wherein the initial neural network further comprises a module for classifying sample data, and when it is judged that the sample data input into the initial neural network is a real shot, mark The data is a shadow-removed image collected in the real scene, and the parameters inside the second-level network are adjusted according to the difference between the shadow-removed image output by the initial neural network and the shadow-removed image as the label data; when When it is judged that the sample data input to the initial neural network is a synthetic shadow image, the label data includes the shadow-free image and the pure shadow image collected in real scene, according to the shadow area mask image and the pure shadow The difference between the graphs adjusts the internal parameters of the first-level network, and adjusts the internal parameters of the second-level network according to the difference between the shadow-removed image output by the initial neural network and the shadow-free image.
The image processing method according to claim 6, wherein when using sample data to train the initial neural network, the loss function includes at least one of the following: pixel loss, feature loss, structural similarity loss, adversarial loss, Shadow edge loss, shadow brightness loss.
The image processing method according to claim 10, wherein the pixel loss includes a pixel truncation loss, when the absolute difference between the two corresponding pixels in the output image of the initial neural network and the label image is greater than a given threshold , calculate the loss of the two pixels; when the absolute difference between the output image of the initial neural network and the corresponding two pixels in the label image is not greater than the given threshold, ignore the two pixels difference.
The image processing method according to claim 10, wherein the shadow brightness loss is such that the brightness of the region corresponding to the shadow region in the shadow removal map output by the neural network is the same as the brightness of the region to be input. The brightness difference of the shadow area in the processed image is greater than 0, and is used to increase the brightness of the area corresponding to the shadow area in the shadow removal image.
The image processing method according to claim 10, wherein when the loss function includes the shadow edge loss, the image processing method comprises: performing dilation processing on the shadow area mask map to obtain an dilation map ; Perform erosion processing on the mask image of the shadow area to obtain an erosion image; obtain the difference set of the expansion image and the erosion image as the boundary area of shadow and non-shading, and use TVLoss to perform smoothing.
An image processing device, comprising:

An image acquisition unit, configured to acquire an image to be processed that includes a shaded area;

The processing unit is configured to receive the image to be processed, and process the image to be processed using a trained neural network to obtain a shadow-removed image; wherein, the neural network includes a two-stage cascaded first-level network and a second-level network, The first-level network receives the image to be processed and outputs a shadow area mask map, and the second-level network simultaneously receives the image to be processed and the shadow area mask map, and outputs the shadow-removed image.
The image processing device according to claim 14, wherein the first-level network comprises:

The first feature extraction module includes a first encoder for extracting features of the image to be processed layer by layer to obtain a first set of feature data;

The shadow area estimation module is connected to the output of the first feature extraction module and includes a first decoder for estimating a shadow area based on the first set of feature data and outputting a shadow area mask map.
The image processing device according to claim 14, wherein the second level network comprises:

The second feature extraction module, including a second encoder, is connected to the output of the first-level network, receives the shadow area mask map output by the first-level network while receiving the image to be processed, and is used to obtain the second group feature data;

The result image output module is connected to the output of the second feature extraction module, and includes a second decoder, configured to output a shadow-removed image based on the second set of feature data.
A storage medium, characterized in that the storage medium includes a stored program, wherein when the program is running, the device where the storage medium is located is controlled to execute the image processing method according to any one of claims 1 to 13.
An electronic device, characterized in that it comprises:

processor; and

a memory for storing executable instructions of the processor;

Wherein, the processor is configured to execute the image processing method described in any one of claims 1 to 13 by executing the executable instructions.