CN111696028A

CN111696028A - Method and device for processing cartoon of real scene image, computer equipment and storage medium

Info

Publication number: CN111696028A
Application number: CN202010440936.1A
Authority: CN
Inventors: 何盛烽; 李思敏; 孙子荀; 刘婷婷
Original assignee: South China University of Technology SCUT; Tencent Technology Shenzhen Co Ltd
Current assignee: South China University of Technology SCUT; Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-09-22

Abstract

The present application relates to a method, device, computer equipment and storage medium for processing a real scene image cartoon based on artificial intelligence. The method includes: acquiring a real scene image; performing image reconstruction processing and abstract smoothing processing on the real scene image based on the semantic information of the real scene image to obtain an abstract cartoon image in which the real scene image is mapped on the cartoon domain; the abstract cartoon image has The salient structure in the real scene image, and the contour edge lines in the real scene image are missing; the abstract cartoon image is stylized to generate a style cartoon image with artistic style; the contour edge line of the style cartoon image is generated to obtain the real scene The cartoon image after the image is cartoonized. By adopting the method, the quality of the generated cartoon image can be improved.

Description

Processing method, device, computer equipment and storage for real scene image cartoonization medium

技术领域technical field

本申请涉及人工智能技术领域以及图像处理技术领域，特别是涉及一种真实场景图像卡通化的处理方法、装置、计算机设备和存储介质。The present application relates to the technical field of artificial intelligence and the technical field of image processing, and in particular, to a processing method, device, computer equipment and storage medium for cartoonizing real scene images.

背景技术Background technique

随着科学技术的飞速发展，各种先进的图像处理技术不断涌现，而且，图像处理技术的应用场景也越来越广泛。比如，将真实场景图像进行卡通化处理，就属于其中一个场景。With the rapid development of science and technology, various advanced image processing technologies continue to emerge, and the application scenarios of image processing technologies are becoming more and more extensive. For example, cartoonizing a real scene image belongs to one of the scenes.

传统的卡通化方法，是基于手动定制的算子或检测参数，来对真实图片进行卡通化转换，未考虑到图片自身的信息，仅是固式地进行图像转换，并不涉及相关艺术性处理，导致生成的卡通化图像质量比较差。那么，如何使用其他技术(比如，人工智能技术)生成质量较高的卡通图像，是值得思考的问题。The traditional cartoonization method is based on manually customized operators or detection parameters to cartoonize the real picture, without considering the information of the picture itself, it only performs image transformation in a fixed manner, and does not involve related artistic processing. The resulting cartoon images are of poor quality. Then, how to use other technologies (for example, artificial intelligence technology) to generate higher-quality cartoon images is a question worth thinking about.

发明内容SUMMARY OF THE INVENTION

基于此，有必要针对上述技术问题，提供一种能够提高图像质量的真实场景图像卡通化的处理方法、装置、计算机设备和存储介质。Based on this, it is necessary to provide a processing method, apparatus, computer equipment and storage medium for cartoonizing a real scene image that can improve the image quality in view of the above technical problems.

一种真实场景图像卡通化的处理方法，方法包括：A method for processing real scene image cartoonization, the method includes:

获取真实场景图像；Get real scene images;

基于真实场景图像的语义信息，对真实场景图像进行图像重构处理和抽象平滑处理，得到真实场景图像映射在卡通域上的抽象卡通图像；抽象卡通图像，具有真实场景图像中的显著性结构、且缺失真实场景图像中的轮廓边缘线条；Based on the semantic information of the real scene image, image reconstruction and abstract smoothing are performed on the real scene image, and an abstract cartoon image is obtained by mapping the real scene image on the cartoon domain; the abstract cartoon image has the saliency structure, And the contour edge lines in the real scene image are missing;

对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像；Stylize abstract cartoon images to generate artistic style cartoon images;

生成风格卡通图像的轮廓边缘线条，得到真实场景图像卡通化后的卡通图像。The contour edge lines of the style cartoon image are generated, and the cartoon image of the real scene image after cartoonization is obtained.

一种真实场景图像卡通化的处理装置，装置包括：A processing device for cartoonizing real scene images, the device comprises:

获取模块，用于获取真实场景图像；The acquisition module is used to acquire real scene images;

抽象处理模块，用于基于真实场景图像的语义信息，对真实场景图像进行图像重构处理和抽象平滑处理，得到真实场景图像映射在卡通域上的抽象卡通图像；抽象卡通图像，具有真实场景图像中的显著性结构、且缺失真实场景图像中的轮廓边缘线条；对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像；The abstract processing module is used to perform image reconstruction processing and abstract smoothing processing on the real scene image based on the semantic information of the real scene image, and obtain an abstract cartoon image in which the real scene image is mapped on the cartoon domain; the abstract cartoon image has the real scene image. The saliency structure in the real scene image is missing, and the contour edge lines in the real scene image are missing; the abstract cartoon image is stylized to generate a style cartoon image with artistic style;

线条生成模块，用于生成风格卡通图像的轮廓边缘线条，得到真实场景图像卡通化后的卡通图像。The line generation module is used to generate the outline and edge lines of the style cartoon image, and obtain the cartoon image after cartoonization of the real scene image.

在一个实施例中，真实场景图像，是从视频帧序列中按序提取得到；视频帧序列中包括按序排列的至少两个真实场景图像；In one embodiment, the real scene images are sequentially extracted from a video frame sequence; the video frame sequence includes at least two real scene images arranged in sequence;

装置还包括：The device also includes:

输出模块，用于当视频帧序列为视频文件中的图像帧序列时，则根据视频帧序列中每张真实场景图像卡通化后的卡通图像，生成卡通化视频文件，或，当视频文件被播放时，实时获取与视频帧序列中各真实场景图像对应的卡通图像并输出。The output module is used to generate a cartoon video file according to the cartoon image of each real scene image in the video frame sequence when the video frame sequence is the image frame sequence in the video file, or, when the video file is played , obtain and output the cartoon image corresponding to each real scene image in the video frame sequence in real time.

在一个实施例中，输出模块还用于当视频帧序列为实时视频流中的图像帧序列时，则实时根据视频帧序列中真实场景图像卡通化后的卡通图像，输出卡通化视频流。In one embodiment, the output module is further configured to, when the video frame sequence is an image frame sequence in the real-time video stream, output the cartoonized video stream in real time according to the cartoon image of the real scene image in the video frame sequence after cartoonization.

在一个实施例中，抽象处理模块还用于将真实场景图像输入已训练的深度神经网络中，以在第一阶段中，对真实场景图像进行语义提取，基于提取的语义信息，对真实场景图像进行图像重构处理，并对重构后的图像内容进行双边滤波平滑处理，生成抽象卡通图像。In one embodiment, the abstract processing module is further configured to input the real scene image into the trained deep neural network, so that in the first stage, semantic extraction is performed on the real scene image, and based on the extracted semantic information, the real scene image is extracted. Perform image reconstruction processing, and perform bilateral filtering and smoothing processing on the reconstructed image content to generate abstract cartoon images.

在一个实施例中，真实场景图像被输入至深度神经网络中的抽象网络的第一生成器中；装置还包括：In one embodiment, the real scene image is input into the first generator of the abstract network in the deep neural network; the apparatus further comprises:

模型训练模块，用于获取样本数据集；样本数据集中包括真实场景样本图的第一集合；将样本数据集输入待训练的深度神经网络中进行迭代训练，在每轮迭代中确定结构重建损失值和双边滤波平滑损失值，并根据所述结构重建损失值和双边滤波平滑损失值，确定目标损失值；在每轮迭代中，根据目标损失值，调整网络模型参数，直至停止迭代，得到训练完毕的深度神经网络；网络模型参数包括第一生成器的参数；训练完毕的深度神经网络中，包括训练完毕的抽象网络；其中，结构重建损失值，用于表示第一生成器根据真实场景样本图生成的卡通图像，与真实场景样本图之间的结构特征差异；双边滤波平滑损失值，用于根据像素值相似性和空间位置相似性，确定第一生成器生成的卡通图像中相邻像素之间的差异。The model training module is used to obtain the sample data set; the sample data set includes the first set of real scene sample graphs; the sample data set is input into the deep neural network to be trained for iterative training, and the structural reconstruction loss value is determined in each iteration and bilateral filter smoothing loss value, and reconstruct the loss value and bilateral filtering smoothing loss value according to the structure to determine the target loss value; in each iteration, adjust the network model parameters according to the target loss value until the iteration is stopped, and the training is completed The network model parameters include the parameters of the first generator; the trained deep neural network includes the trained abstract network; wherein, the structural reconstruction loss value is used to represent the first generator according to the real scene sample map The difference in structural features between the generated cartoon image and the real scene sample image; the bilateral filtering smoothes the loss value, which is used to determine the difference between adjacent pixels in the cartoon image generated by the first generator according to the similarity of pixel values and the similarity of spatial positions. difference between.

在一个实施例中，抽象处理模块还用于在所述第一阶段中，通过第一生成器对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像。In one embodiment, the abstract processing module is further configured to, in the first stage, perform stylization processing on the abstract cartoon image by the first generator to generate a style cartoon image with artistic style.

在一个实施例中，抽象网络还包括第一判别器；网络模型参数还包括第一判别器的参数；样本数据集还包括抽象卡通样本图的第二集合；模型训练模块还用于在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值和风格增强损失值，并根据所述结构重建损失值、双边滤波平滑损失值和风格增强损失值，确定目标损失值；其中，风格增强损失值，是由第一判别器根据第一生成器生成的卡通图像输出的第一概率确定；第一概率，用于表示第一生成器生成的卡通图像，属于抽象卡通样本图的风格的概率。In one embodiment, the abstract network further includes a first discriminator; the network model parameters further include parameters of the first discriminator; the sample data set further includes a second set of abstract cartoon sample images; the model training module is further configured to perform at each round In the iteration, the structural reconstruction loss value, the bilateral filtering smoothing loss value and the style enhancement loss value are determined, and the target loss value is determined according to the structural reconstruction loss value, the bilateral filtering smoothing loss value and the style enhancement loss value; among them, the style enhancement loss The value is determined by the first discriminator according to the first probability output by the cartoon image generated by the first generator; the first probability is used to represent the probability that the cartoon image generated by the first generator belongs to the style of the abstract cartoon sample image.

在一个实施例中，模型训练模块还用于获取原始卡通图像的第三集合；第三集合中的原始卡通图像具有同一种艺术风格；根据预训练的线追踪网络，提取各原始卡通图像的轮廓边缘线条；根据提取的轮廓边缘线条和原始卡通图像，生成抽象卡通样本图，得到第二集合；抽象卡通样本图，是从原始卡通图像中消除轮廓边缘线条后的抽象图像。In one embodiment, the model training module is further used to obtain a third set of original cartoon images; the original cartoon images in the third set have the same artistic style; according to the pre-trained line tracing network, the contours of each original cartoon image are extracted Edge lines; according to the extracted contour edge lines and the original cartoon image, an abstract cartoon sample image is generated, and the second set is obtained; the abstract cartoon sample image is an abstract image after the contour edge lines are eliminated from the original cartoon image.

在一个实施例中，深度学习网络为两阶段神经网络，抽象网络为两阶段神经网络中的第一阶段抽象网络；两阶段神经网络中还包括第二阶段线条描绘网络；线条生成模块还用于在第二阶段中，将在第一阶段生成的风格卡通图像，输入至已训练的两阶段神经网络中第二阶段线条描绘网络中的第二生成器，以通过第二生成器对风格卡通图像进行轮廓边缘线条生成处理，得到真实场景图像卡通化后的卡通图像；其中，第二阶段线条描绘网络，是在第二阶段中，生成轮廓边缘线条的深度神经网络。In one embodiment, the deep learning network is a two-stage neural network, and the abstract network is a first-stage abstract network in the two-stage neural network; the two-stage neural network further includes a second-stage line drawing network; the line generation module is also used for In the second stage, the style cartoon image generated in the first stage is input to the second generator in the second stage line drawing network in the trained two-stage neural network, so as to pass the second generator to the style cartoon image The contour edge line generation process is performed to obtain a cartoon image after the real scene image is cartoonized; wherein, the second stage line drawing network is a deep neural network that generates contour edge lines in the second stage.

在一个实施例中，抽象卡通样本图，是从原始卡通图像中消除轮廓边缘线条的抽象图像；样本数据集还包括与抽象卡通样本图对应的原始卡通图像的第三集合；所述网络模型参数还包括所述第二生成器的参数；模型训练模块还用于在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值、风格增强损失值和边缘线条分配损失值，并根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值和边缘线条分配损失值，确定目标损失值；边缘线条分配损失值，用于表示以抽象卡通样本图作为第二生成器的输入时所生成的具有线条的图像，与抽象卡通样本图所对应的原始卡通图像之间的差异。In one embodiment, the abstract cartoon sample image is an abstract image that eliminates contour edge lines from the original cartoon image; the sample data set further includes a third set of original cartoon images corresponding to the abstract cartoon sample image; the network model parameters It also includes the parameters of the second generator; the model training module is also used to determine the loss value of structural reconstruction, bilateral filter smoothing loss value, style enhancement loss value and edge line distribution loss value in each round of iteration, and according to the Structural reconstruction loss value, bilateral filter smoothing loss value, style enhancement loss value and edge line distribution loss value determine the target loss value; edge line distribution loss value is used to represent the value of the abstract cartoon sample image as the input of the second generator. The difference between the resulting image with lines and the original cartoon image corresponding to the abstract cartoon sample image.

在一个实施例中，第二阶段线条描绘网络还包括第二判别器；网络模型参数还包括第二判别器的参数；模型训练模块还用于在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值、风格增强损失值、边缘线条分配损失值和边缘线条增强损失值；根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值、边缘线条分配损失值和所述边缘线条增强损失值，确定目标损失值；边缘线条增强损失值，是将第二生成器以第一生成器所生成的图像作为输入时所生成的具有线条的图像，输入至第二判别器后，由第二判别器输出的第二概率确定；第二概率，用于表征第二生成器生成的具有线条的图像与原始卡通图像之间的线条强度差异。In one embodiment, the second-stage line drawing network further includes a second discriminator; the network model parameters further include parameters of the second discriminator; the model training module is further configured to determine the structural reconstruction loss value, bilateral filter smoothing loss value, style enhancement loss value, edge line assignment loss value and edge line enhancement loss value; reconstruct loss value, bilateral filtering smoothing loss value, style enhancement loss value, edge line assignment loss value and said edge The line enhancement loss value determines the target loss value; the edge line enhancement loss value is the image with lines generated when the second generator takes the image generated by the first generator as input, and after inputting it to the second discriminator, The second probability is determined by the second probability output by the second discriminator; the second probability is used to characterize the line intensity difference between the image with lines generated by the second generator and the original cartoon image.

在一个实施例中，模型训练模块还用于获取真实场景样本图的第一集合、抽象卡通样本图的第二集合、以及与所述抽象卡通样本图对应的原始卡通图像的第三集合；将第一集合输入待训练的抽象网络中进行结构重建迭代训练，在每轮结构重建迭代训练中确定结构重建损失值，并根据结构重建损失值，调整第一生成器的参数，直至达到初始化训练停止条件，得到初始化抽象网络；将第一集合和第二集合输入至初始化抽象网络、以及将第二集合和第三集合输入至待训练的第二阶段线条描绘网络进行网络层叠迭代训练，并在每轮迭代中确定结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值；根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值，确定综合损失值，并根据综合损失值，调整初始化抽象网络的第一生成器和第一判别器的参数、以及调整待训练的第二阶段线条描绘网络的第二生成器和第二判别器的参数，直至迭代停止，得到训练完毕的两阶段神经网络；两阶段神经网络中包括第一阶段抽象网络和第二阶段线条描绘网络。In one embodiment, the model training module is further configured to obtain a first set of real scene sample images, a second set of abstract cartoon sample images, and a third set of original cartoon images corresponding to the abstract cartoon sample images; The first set is input to the abstract network to be trained for structural reconstruction iterative training, the structural reconstruction loss value is determined in each round of structural reconstruction iterative training, and the parameters of the first generator are adjusted according to the structural reconstruction loss value until the initialization training stops. condition, to obtain the initialized abstract network; input the first set and the second set to the initialized abstract network, and input the second set and the third set to the second-stage line drawing network to be trained for network cascade iterative training, and at each Determine the structure reconstruction loss value, the bilateral filter smoothing loss value, the style enhancement loss value, and the edge line assignment loss value in the iterations; according to the structure reconstruction loss value, the bilateral filter smoothing loss value, the style enhancement loss value, and the edge line assignment Loss value, determine the comprehensive loss value, and according to the comprehensive loss value, adjust the parameters of the first generator and the first discriminator to initialize the abstract network, and adjust the second generator and second generator of the line drawing network in the second stage to be trained. The parameters of the discriminator until the iteration stops, and the trained two-stage neural network is obtained; the two-stage neural network includes the first-stage abstract network and the second-stage line drawing network.

一种计算机设备，包括存储器和处理器，存储器存储有计算机程序，处理器执行计算机程序时实现以下步骤：A computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:

获取真实场景图像；Get real scene images;

一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现以下步骤：A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

获取真实场景图像；Get real scene images;

上述真实场景图像卡通化的处理方法、装置、计算机设备和存储介质，深入理解了真实场景图像中的关键语义信息，并基于真实场景图像的语义信息，对真实场景图像进行图像重构处理和抽象平滑处理，得到真实场景图像映射在卡通域上的抽象卡通图像，将图像重构和抽象平滑处理相结合，既保持了真实场景图像中的显著性结构，又能够平滑掉无关紧要的细节，使得色彩和图像内容平滑。然后，对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像。相当于在第一阶段对图像进行了抽象平滑处理、以及图像内容的抽象和风格化处理，因此，基于第一阶段得到的风格卡通图像生成轮廓边缘线条，能够避免在非显著性区域中催生不必要的线条，使得线条能够清晰地集中在显著性结构区域的轮廓边缘，进而能够得到图像质量更高的卡通图像。The above-mentioned processing method, device, computer equipment and storage medium for cartoonizing real scene images deeply understand the key semantic information in real scene images, and based on the semantic information of real scene images, image reconstruction processing and abstraction are performed on real scene images. Smoothing, obtaining an abstract cartoon image of the real scene image mapped on the cartoon domain, combining image reconstruction and abstract smoothing, not only maintaining the salient structure in the real scene image, but also smoothing out insignificant details, making Colors and image content are smooth. Then, the abstract cartoon image is stylized to generate a style cartoon image with artistic style. It is equivalent to abstracting and smoothing the image in the first stage, as well as abstracting and stylizing the image content. Therefore, the contour edge lines are generated based on the style cartoon image obtained in the first stage, which can avoid the occurrence of inconsistencies in the non-salient region. Necessary lines, so that the lines can be clearly concentrated on the contour edge of the salient structure region, and then a cartoon image with higher image quality can be obtained.

附图说明Description of drawings

图1为一个实施例中真实场景图像卡通化的处理方法的应用环境图；Fig. 1 is the application environment diagram of the processing method of real scene image cartoonization in one embodiment;

图2为一个实施例中真实场景图像卡通化的处理方法的流程示意图；2 is a schematic flowchart of a processing method for cartoonizing a real scene image in one embodiment;

图3为一个实施例中两阶段神经网络的网络结构示意图；3 is a schematic diagram of the network structure of a two-stage neural network in one embodiment;

图4为一个实施例中卡通图像示意图；4 is a schematic diagram of a cartoon image in one embodiment;

图5为一个实施例不同状态下的结果示意图；5 is a schematic diagram of results in different states of an embodiment;

图6为一个实施例中本案与参照组的比对实验效果图；Fig. 6 is the comparison experiment effect diagram of this case and the reference group in one embodiment;

图7为一个实施例中不同风格的卡通图像的对比图；7 is a comparison diagram of cartoon images of different styles in one embodiment;

图8为一个实施例中真实场景图像卡通化的处理装置的结构框图；8 is a structural block diagram of a processing device for cartoonizing a real scene image in one embodiment;

图9为另一个实施例中真实场景图像卡通化的处理装置的结构框图；9 is a structural block diagram of a processing device for cartoonizing a real scene image in another embodiment;

图10为一个实施例中计算机设备的内部结构图；Figure 10 is an internal structure diagram of a computer device in one embodiment;

图11为另一个实施例中计算机设备的内部结构图。FIG. 11 is an internal structure diagram of a computer device in another embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

本申请提供的真实场景图像卡通化的处理方法，可以应用于如图1所示的应用环境中。其中，终端102通过网络与服务器104进行通信。其中，终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备，服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The processing method for cartoonizing a real scene image provided by the present application can be applied to the application environment shown in FIG. 1 . The terminal 102 communicates with the server 104 through the network. The terminal 102 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the server 104 can be implemented by an independent server or a server cluster composed of multiple servers.

终端102可以获取用户输入或者自身采集的真实场景图像，将真实场景图像发送至服务器104中，由服务器104执行本申请各实施例中的真实场景图像卡通化的处理方法。服务器104可以将生成的卡通图像输出至终端102进行展示。The terminal 102 can obtain the real scene image input by the user or collected by itself, and send the real scene image to the server 104, and the server 104 executes the processing method for cartoonizing the real scene image in each embodiment of the present application. The server 104 can output the generated cartoon image to the terminal 102 for display.

可以理解，终端102自身也可以具备执行本申请各实施例中的真实场景图像卡通化的处理方法，那么，终端102则可以直接将获取的真实场景图像转换为卡通图像，并进行输出展示。这里并不限定执行本申请各实施例中的方法的执行主体，即终端102和服务器104皆可以执行。It can be understood that the terminal 102 itself may also have the processing method for performing the cartoonization of real scene images in various embodiments of the present application, then the terminal 102 may directly convert the acquired real scene images into cartoon images, and output them for display. The execution subject that executes the methods in the embodiments of the present application is not limited here, that is, both the terminal 102 and the server 104 may execute.

可以理解，本申请各实施例中的真实场景图像卡通化的处理方法，相当于使用了人工智能技术来自动将真实场景图像转换为卡通图像。It can be understood that the processing methods for cartoonizing real scene images in the embodiments of the present application are equivalent to using artificial intelligence technology to automatically convert real scene images into cartoon images.

人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

可以理解，本申请各实施例中的真实场景图像卡通化的处理方法，使用了人工智能技术中的计算机视觉技术，来对真实场景图像进行图像重构、平滑等抽象平滑处理、风格化处理以及生成边缘线条等图像处理，从而自动地将真实场景图像转换为卡通图像，实现了真实场景卡通化处理。It can be understood that the processing methods for cartoonizing real scene images in various embodiments of the present application use computer vision technology in artificial intelligence technology to perform image reconstruction, smoothing and other abstract smoothing processing, stylization processing and stylization processing on real scene images. Image processing such as edge lines is generated, so that the real scene image is automatically converted into a cartoon image, and the cartoon processing of the real scene is realized.

计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学，更进一步的说，就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉，并进一步做图形处理，使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科，计算机视觉研究相关的理论和技术，试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术，还包括常见的人脸识别、指纹识别等生物特征识别技术。Computer Vision Technology (Computer Vision, CV) Computer vision is a science that studies how to make machines "see". Further, it refers to the use of cameras and computers instead of human eyes to identify, track and measure targets. Machine vision, And further do graphics processing, so that computer processing becomes more suitable for human eye observation or transmission to the instrument detection image. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping It also includes common biometric identification technologies such as face recognition and fingerprint recognition.

此外，本申请各实施例中的真实场景图像卡通化的处理方法，也使用了人工智能技术中的机器学习处理技术。可以理解，本申请实施例中相当于使用了机器学习技术来理解图像语义，并进行图像重构和抽象平滑处理。此外，在本申请部分实施例中，涉及到深度学习网络的训练及使用过程，充分说明，本申请基于机器学习实现了对真实场景图像卡通化的处理。In addition, the processing methods for cartoonizing real scene images in the embodiments of the present application also use the machine learning processing technology in the artificial intelligence technology. It can be understood that the embodiments of the present application are equivalent to using machine learning technology to understand image semantics, and to perform image reconstruction and abstract smoothing processing. In addition, some of the embodiments of the present application involve the training and use process of the deep learning network, and it is fully explained that the present application realizes the processing of cartoonizing real scene images based on machine learning.

机器学习(Machine Learning,ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.

在一个实施例中，如图2所示，提供了一种真实场景图像卡通化的处理方法，以该方法应用于计算机设备为例进行说明，计算机设备可以是图1中的服务器或终端，包括以下步骤：In one embodiment, as shown in FIG. 2 , a method for processing real scene image cartoonization is provided, and the method is applied to a computer device as an example for illustration. The computer device may be the server or terminal in FIG. 1 , including The following steps:

步骤202，获取真实场景图像。Step 202, acquiring a real scene image.

其中，真实场景图像，是指真实场景下的图像。The real scene image refers to an image in a real scene.

在一个实施例中，真实场景图像的图像内容，可以包括真实场景下的环境图像内容和对象图像内容中的至少一种。In one embodiment, the image content of the real scene image may include at least one of environment image content and object image content in the real scene.

可以理解，环境图像内容，用于表示对象所处的背景环境的图像内容。对象可以为人物、动物和物体等中的至少一种。因此，对象图像内容，可以为人物图像内容、动物图像内容和物体图像内容中的至少一种。It can be understood that the environmental image content is used to represent the image content of the background environment where the object is located. The object may be at least one of a person, an animal, an object, and the like. Therefore, the object image content may be at least one of person image content, animal image content, and object image content.

在一个实施例中，真实场景图像可以是单独的图片，也可以是视频帧序列中的图像。In one embodiment, the real scene image may be an individual picture, or may be an image in a sequence of video frames.

在一个实施例中，视频帧序列，可以是已经生成的视频文件中的图像帧序列，即，真实场景图像，可以是视频文件中一帧一帧的图像。从而实现视频卡通化。In one embodiment, the video frame sequence may be an image frame sequence in a video file that has been generated, that is, a real scene image, and may be a frame-by-frame image in the video file. Thereby realizing video cartoonization.

在一个实施例中，视频帧序列，也可以是实时视频流中的图像帧序列。即，真实场景图像，可以是实时视频流中一帧一帧的图像。从而实现实时视频卡通化。In one embodiment, the video frame sequence may also be an image frame sequence in a real-time video stream. That is, the real scene image may be a frame-by-frame image in a real-time video stream. Thereby realizing real-time video cartoonization.

本申请各实施例中的方法，是将真实场景图像进行卡通化处理的方法，以将其转换为卡通图像。卡通图像，是指从自然真实的原型(即真实场景图像)中提炼特征元素，用艺术的手法重新表现的一种非真实的图像。The method in each embodiment of the present application is a method for cartoonizing a real scene image to convert it into a cartoon image. Cartoon image refers to an unreal image that is extracted from natural and real prototypes (ie, real scene images) and reproduced by artistic methods.

步骤204，基于真实场景图像的语义信息，对真实场景图像进行图像重构处理和抽象平滑处理，得到真实场景图像映射在卡通域上的抽象卡通图像；抽象卡通图像，具有真实场景图像中的显著性结构、且缺失真实场景图像中的轮廓边缘线条。Step 204, based on the semantic information of the real scene image, perform image reconstruction processing and abstract smoothing processing on the real scene image, to obtain an abstract cartoon image in which the real scene image is mapped on the cartoon domain; the abstract cartoon image has significant features in the real scene image. Sexual structure, and lack of contour edge lines in real scene images.

其中，语义信息，即为图像的特征信息。Among them, the semantic information is the feature information of the image.

图像重构处理，是指重新构建生成图像的处理过程。Image reconstruction processing refers to the process of reconstructing the generated image.

抽象平滑处理，即为滤波处理，用于消除图像中无关紧要的细节，并对图像内容抽象为卡通图像。可以理解，由于真实场景图像中包括丰富的图像细节，这些图像细节不应在卡通图像中显示，所以对其进行抽象平滑处理，以消除不必要的干扰，实现色彩等分布的平滑。Abstract smoothing processing, that is, filtering processing, is used to eliminate irrelevant details in the image and abstract the image content into a cartoon image. It can be understood that since real scene images include rich image details, these image details should not be displayed in cartoon images, so abstract smoothing processing is performed to eliminate unnecessary interference and achieve smooth distribution of colors.

显著性结构，是真实场景图像中关键区域的结构特征。即，真实场景中的关键图像内容。轮廓边缘线条，是指真实场景图像中图像内容的边缘轮廓线条。可以理解，真实场景图像中各区域都具有图像内容，而每部分的图像内容都会有相应的边缘轮廓线条。Saliency structure is the structural feature of key regions in real scene images. That is, the key image content in the real scene. The contour edge line refers to the edge contour line of the image content in the real scene image. It can be understood that each area in the real scene image has image content, and each part of the image content will have corresponding edge contour lines.

抽象卡通图像，是指将在真实域上真实场景图像映射至卡通域上的非真实的卡通图像。可以理解，本申请实施例中，基于的真实场景图像的语义信息，对真实场景图像进行图像重构处理，相当于考虑了真实场景图像中的关键特征进行图像重构，所以能够使得生成的抽象卡通图像具有真实场景图像中的显著性结构，而且，将图像重构和抽象平滑处理结合起来，去除了不必要的细节，所以，抽象卡通图像缺失真实场景图像中的轮廓边缘线条这些不必要的细节，变得更加平滑。An abstract cartoon image refers to a non-real cartoon image that maps a real scene image in the real domain to a cartoon image in the cartoon domain. It can be understood that in the embodiment of the present application, performing image reconstruction processing on the real scene image based on the semantic information of the real scene image is equivalent to performing image reconstruction considering the key features in the real scene image, so that the generated abstraction can be reduced. The cartoon image has the saliency structure in the real scene image, and the combination of image reconstruction and abstract smoothing removes unnecessary details. Therefore, the abstract cartoon image lacks the unnecessary contour lines and lines in the real scene image. Details, become smoother.

具体地，计算机设备可以将真实场景图像输入至预先训练的深度神经网络，以对真实场景图像进行语义提取处理，并基于提取的语义信息，对真实场景图像进行图像重构处理，然后对重构后的图像进行抽象平滑处理，得到真实场景图像映射在卡通域上的抽象卡通图像。Specifically, the computer device can input the real scene image into a pre-trained deep neural network to perform semantic extraction processing on the real scene image, and based on the extracted semantic information, perform image reconstruction processing on the real scene image, and then reconstruct the real scene image. After the image is abstracted and smoothed, the abstract cartoon image of the real scene image mapped on the cartoon domain is obtained.

在一个实施例中，上述预训练的深度神经网络中可以包括至少两个阶段的神经网络。所以，计算机设备可以将真实场景图像输入至深度神经网络中的第一阶段的神经网络中。可以理解，在其他实施例中，深度神经网络可以仅为单阶段的神经网络。In one embodiment, the above-mentioned pre-trained deep neural network may include at least two-stage neural networks. Therefore, the computer equipment can input the real scene image into the neural network of the first stage in the deep neural network. It will be appreciated that in other embodiments, the deep neural network may only be a single-stage neural network.

步骤206，对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像。Step 206, performing stylization processing on the abstract cartoon image to generate a style cartoon image with artistic style.

其中，风格化处理，是指将抽象卡通图像赋予艺术风格的处理。Among them, the stylized processing refers to the processing of imparting artistic style to abstract cartoon images.

艺术风格，是指文艺创作中表现出来的一种带有综合性的总体特点。即为艺术作品在整体上呈现的有代表性的面貌。风格卡通图像，相较于步骤204中的抽象卡通图像增加了艺术风格。Artistic style refers to a comprehensive overall feature manifested in literary and artistic creations. It is the representative appearance of the work of art as a whole. Compared with the abstract cartoon image in step 204, an artistic style is added to the style cartoon image.

可以理解，不同的艺术家具有不同的艺术风格，所以，通过对抽象卡通图像进行风格化处理，能够生成与某一艺术家的创作风格相似的风格卡通图像。比如，可以将抽象卡通图像风格化为与宫崎骏大师的创作风格相似的风格卡通图像。It can be understood that different artists have different artistic styles, so by stylizing the abstract cartoon images, a cartoon image with a style similar to the creative style of a certain artist can be generated. For example, an abstract cartoon image can be stylized into a cartoon image of a style similar to that created by Master Hayao Miyazaki.

需要说明的是，这里并未限定将抽象卡通图像转换为一种固定艺术风格的风格卡通图像。实际上，可以根据需要，将抽象卡通图像分别风格化为具有不同艺术风格的风格卡通图像。It should be noted that the conversion of an abstract cartoon image into a style cartoon image with a fixed artistic style is not limited here. In fact, the abstract cartoon images can be stylized into stylized cartoon images with different artistic styles, respectively, as required.

在一个实施例中，计算机设备可以同样使用步骤204中的深度神经网络，对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像。In one embodiment, the computer device may also use the deep neural network in step 204 to stylize the abstract cartoon image to generate a stylized cartoon image with artistic style.

在另一个实施例中，计算机设备也可以使用预设的风格化模板，将抽象卡通图像与该风格化模板进行图像叠加融合处理，生成具有艺术风格的风格卡通图像。比如，可以通过对抽象卡通图像进行添加滤镜(即为风格化模板)的方式，为其添加艺术风格。In another embodiment, the computer device may also use a preset stylized template to perform image overlay and fusion processing on the abstract cartoon image and the stylized template to generate a style cartoon image with artistic style. For example, an artistic style can be added to an abstract cartoon image by adding a filter (ie, a stylized template).

可以理解，风格卡通图像相较于抽象卡通图像而言，增加了艺术风格的信息，但风格卡通图像仍然缺失轮廓边缘线条。It can be understood that compared to abstract cartoon images, style cartoon images increase the information of artistic style, but style cartoon images still lack outline and edge lines.

步骤208，生成风格卡通图像的轮廓边缘线条，得到真实场景图像卡通化后的卡通图像。Step 208 , generating the outline and edge lines of the style cartoon image, and obtaining the cartoon image after the real scene image is cartoonized.

其中，真实场景图像卡通化后的卡通图像，是具有轮廓边缘线条的、具有真实场景图像中的显著性结构的、且具有艺术风格的卡通化图像。Among them, the cartoon image after the real scene image is cartoonized is a cartoon image with outline edge lines, a salient structure in the real scene image, and an artistic style.

具体地，计算机设备可以针对风格卡通图像进行边缘线条生成处理，以生成风格卡通图像的轮廓边缘线条，从而得到真实场景图像卡通化后的卡通图像。Specifically, the computer device may perform edge line generation processing on the style cartoon image to generate outline edge lines of the style cartoon image, so as to obtain a cartoon image after the real scene image is cartoonized.

在一个实施例中，当步骤204中的深度神经网络中包括至少两个阶段的神经网络时，计算机设备可以将真实场景图像输入至深度神经网络中的第二阶段的神经网络中，以对风格卡通图像进行边缘线条生成处理。In one embodiment, when the deep neural network in step 204 includes at least two stages of neural networks, the computer device can input the real scene image into the neural network of the second stage in the deep neural network, so as to adjust the style The cartoon image is processed by edge line generation.

在其他实施例中，计算机设备也可以单独训练一个专门用于生成边缘线条的机器学习模型(即，独立于步骤204的机器学习模型)，来生成风格卡通图像的轮廓边缘线条。此外，计算机设备还可以用边缘检测算子，来寻找风格卡通图像中的图像轮廓边缘，并生成轮廓边缘线条，从而得到真实场景图像卡通化后的卡通图像。In other embodiments, the computer device may also separately train a machine learning model dedicated to generating edge lines (ie, independent of the machine learning model in step 204 ) to generate outline edge lines of a cartoon-style image. In addition, the computer equipment can also use the edge detection operator to find the contour edge of the image in the style cartoon image, and generate the contour edge line, so as to obtain the cartoon image of the real scene image after cartoonization.

上述真实场景图像卡通化的处理方法，深入理解了真实场景图像中的关键语义信息，并基于真实场景图像的语义信息，对真实场景图像进行图像重构处理和抽象平滑处理，得到真实场景图像映射在卡通域上的抽象卡通图像，将图像重构和抽象平滑处理相结合，既保持了真实场景图像中的显著性结构，又能够平滑掉无关紧要的细节，使得色彩和图像内容平滑。然后，对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像。相当于在第一阶段对图像进行了抽象平滑处理(以将非显著性结构进行平滑)、以及图像内容的抽象和风格化处理，因此，基于第一阶段得到的风格卡通图像生成轮廓边缘线条，能够避免在非显著性区域中催生不必要的线条，使得线条能够清晰地集中在显著性结构区域的轮廓边缘，进而能够得到图像质量更高的卡通图像。The above-mentioned processing method of real scene image cartoonization deeply understands the key semantic information in the real scene image, and based on the semantic information of the real scene image, image reconstruction processing and abstract smoothing processing are performed on the real scene image, and the real scene image map is obtained. For abstract cartoon images in the cartoon domain, image reconstruction and abstract smoothing are combined, which not only maintains the saliency structure in real scene images, but also smoothes out insignificant details, making the color and image content smooth. Then, the abstract cartoon image is stylized to generate a style cartoon image with artistic style. It is equivalent to abstracting and smoothing the image in the first stage (to smooth the non-salient structure), as well as the abstraction and stylization of the image content. Therefore, based on the style cartoon image obtained in the first stage, the contour edge lines are generated, Unnecessary lines can be avoided in non-salient regions, so that lines can be clearly concentrated on the contour edges of salient structural regions, and cartoon images with higher image quality can be obtained.

在一个实施例中，所述真实场景图像，是从视频帧序列中按序提取得到；所述视频帧序列中包括按序排列的至少两个真实场景图像。In one embodiment, the real scene images are sequentially extracted from a video frame sequence; the video frame sequence includes at least two real scene images arranged in sequence.

可以理解，视频帧序列可以是视频文件中的图像帧序列，也可以是实时视频流中的图像帧序列。It can be understood that the video frame sequence may be an image frame sequence in a video file, or may be an image frame sequence in a real-time video stream.

那么，根据本申请各实施例中的方法，能够针对视频帧序列中的每张真实场景图像，生成相应的卡通图像。Then, according to the methods in the embodiments of the present application, a corresponding cartoon image can be generated for each real scene image in the video frame sequence.

在一个实施例中，该方法还包括：当视频帧序列为视频文件中的图像帧序列时，则根据所述视频帧序列中每张真实场景图像卡通化后的卡通图像，生成卡通化视频文件。In one embodiment, the method further includes: when the video frame sequence is an image frame sequence in a video file, generating a cartoonized video file according to the cartoonized cartoon image of each real scene image in the video frame sequence .

具体地，当视频帧序列为视频文件中的图像帧序列时，计算机设备则可以按序对视频帧序列中的每张真实场景图像皆执行本申请各实施例中的方法，将其转换为卡通图像。然后，根据每张真实场景图像卡通化后的卡通图像，生成卡通化视频文件。Specifically, when the video frame sequence is an image frame sequence in a video file, the computer device may execute the method in each embodiment of the present application on each real scene image in the video frame sequence in sequence, and convert it into a cartoon image image. Then, a cartoonized video file is generated according to the cartoonized cartoon image of each real scene image.

可以理解，这种方式，相当于在播放视频文件之前，通过本申请实施例中的方法，生成相应的卡通化视频文件，后续则可以直接播放该卡通化视频文件。It can be understood that this method is equivalent to generating a corresponding cartoonized video file by using the method in the embodiment of the present application before playing the video file, and the cartoonized video file can be directly played subsequently.

在一个实施例中，该方法还包括：当所述视频帧序列为视频文件中的图像帧序列时，则当所述视频文件被播放时，实时获取与所述视频帧序列中各真实场景图像对应的所述卡通图像并输出。In one embodiment, the method further includes: when the video frame sequence is an image frame sequence in a video file, then when the video file is played, acquiring real-time images corresponding to each real scene in the video frame sequence corresponding to the cartoon image and output.

具体地，本实施例中，在将视频帧序列中的每张真实场景图像转换为相应的卡通图像后，计算机设备也可以不将其生成一个单独的卡通化视频文件，而是，记录真实场景图像和卡通图像之间的对应关系。然后，在后续播放该视频文件，即播放视频文件所对应的视频帧序列时，计算机设备则可以根据该对应关系，实时输出与视频帧序列中各真实场景图像卡通化后的卡通图像。Specifically, in this embodiment, after converting each real scene image in the video frame sequence into a corresponding cartoon image, the computer device may not generate a separate cartoonized video file, but record the real scene Correspondence between images and cartoon images. Then, when the video file is subsequently played, that is, the video frame sequence corresponding to the video file is played, the computer device can output in real time the cartoon image cartoonized with each real scene image in the video frame sequence according to the corresponding relationship.

在一个实施例中，当计算机设备为终端时，那么，终端可以在获取到用户对该视频文件触发卡通播放模式后，实时展示与视频帧序列中各真实场景图像卡通化后的卡通图像。在未获取到对该视频文件触发卡通播放模式时，即在视频文件的正常播放模式下，计算机设备则可以直接播放视频文件本身。可以理解，本实施例能够实现对一个视频文件，实现两种播放模式(即正常播放模式和卡通播放模式)。In one embodiment, when the computer device is a terminal, the terminal can display, in real time, a cartoon image cartoonized with each real scene image in the video frame sequence after acquiring that the user triggers the cartoon playback mode for the video file. When the cartoon playback mode is not triggered for the video file, that is, in the normal playback mode of the video file, the computer device can directly play the video file itself. It can be understood that this embodiment can implement two playback modes (ie, normal playback mode and cartoon playback mode) for one video file.

当计算机设备为服务器时，则可以在终端上播放视频文件时，从服务器中将与所播放的视频文件中各视频帧序列中各真实场景图像对应的卡通图像，实时输出至终端，然后由终端进行卡通图像的实时展示。When the computer device is the server, when the video file is played on the terminal, the cartoon image corresponding to each real scene image in each video frame sequence in the played video file is output from the server to the terminal in real time, and then the terminal Real-time display of cartoon images.

在一个实施例中，所述方法还包括：当所述视频帧序列为实时视频流中的图像帧序列时，则实时根据所述视频帧序列中真实场景图像卡通化后的所述卡通图像，输出卡通化视频流。In one embodiment, the method further includes: when the video frame sequence is an image frame sequence in a real-time video stream, then, in real time, according to the cartoon image of the real scene image in the video frame sequence after cartoonization, Output cartoonized video stream.

可以理解，当视频帧序列为实时视频流中的图像帧序列时，计算机设备则可以在获取到视频流中的每张真实场景图像时，对其执行本申请各实施例中的方法，以将其转换成对应的卡通图像，并实时展示该卡通图像。这样一来，视频流发送端实时发送的是真实场景图像的视频流，展示端实时展示的即为对真实场景图像卡通化后的卡通图像，即实时输出卡通化视频流。It can be understood that when the video frame sequence is an image frame sequence in a real-time video stream, the computer device can perform the methods in the embodiments of the present application on each real scene image in the video stream to convert It is converted into a corresponding cartoon image, and the cartoon image is displayed in real time. In this way, the video stream sending end sends the video stream of the real scene image in real time, and the display end displays the cartoon image of the real scene image in real time, that is, the cartoonized video stream is output in real time.

可以理解，当计算机设备为终端时，实时输出卡通化视频流，则指实时输出展示卡通化视频流。当计算机设备为服务器时，实时输出卡通化视频流，则指将卡通化视频流实时输出至终端，以使终端对其进行实时展示。It can be understood that when the computer device is a terminal, outputting the cartoonized video stream in real time refers to outputting the cartoonized video stream in real time. When the computer device is a server, outputting the cartoonized video stream in real time refers to outputting the cartoonized video stream to the terminal in real time, so that the terminal can display it in real time.

现结合一些场景进行举例说明。比如，直播场景中，由直播发送端采集直播视频流中的真实场景图像，然后，发送至服务器，由服务器对其进行卡通化处理，生成卡通图像，并将卡通图像以流式的方式实时发送至直播接收端，从而，将主播所在的真实场景，转换为卡通场景，呈现给观看直播的用户。又比如，在即时通信中的视频通话场景中，同样可以将视频通话双方的真实场景图像转换为卡通图像，以视频流方式实时展示，从而让视频通话双方能够看到卡通化的动态场景。再比如，在一些视频内容平台上在线观看视频时，针对一个由真实场景图像组成的原始在线视频来说，则可以在用户选择卡通播放模式后，实时输出卡通化视频流，从而播放真实场景图像所对应的卡通化图像。Here are some examples to illustrate. For example, in a live broadcast scene, the live broadcast sender collects the real scene image in the live video stream, and then sends it to the server, and the server performs cartoon processing on it, generates cartoon images, and sends the cartoon images in real time in a streaming manner. To the live broadcast receiving end, so that the real scene where the anchor is located is converted into a cartoon scene and presented to the users watching the live broadcast. For another example, in the video call scene in instant messaging, the real scene images of both parties in the video call can also be converted into cartoon images and displayed in real time in the form of video streams, so that both parties in the video call can see the cartoonized dynamic scene. For another example, when watching videos online on some video content platforms, for an original online video composed of real scene images, after the user selects the cartoon playback mode, the cartoonized video stream can be output in real time, so as to play the real scene images. The corresponding cartoonized image.

上述实施例中，能够将真实场景下的视频文件或者视频流，转换为卡通化视频文件或者卡通化视频流，实现了视频场景的多样性。此外，生成的卡通化视频文件或卡通化视频流中具有高质量的卡通图像，提高了视频文件或者视频流的画面质量。In the above embodiment, video files or video streams in real scenes can be converted into cartoonized video files or cartoonized video streams, thereby realizing the diversity of video scenes. In addition, the generated cartoon video file or cartoon video stream has high-quality cartoon images, which improves the picture quality of the video file or video stream.

在一个实施例中，基于真实场景图像的语义信息，对真实场景图像进行图像重构处理和抽象平滑处理，得到真实场景图像映射在卡通域上的抽象卡通图像包括：将真实场景图像输入已训练的深度神经网络，以对真实场景图像进行语义提取，基于提取的语义信息，对真实场景图像进行图像重构处理，并对重构后的图像内容进行双边滤波平滑处理，生成抽象卡通图像。In one embodiment, performing image reconstruction processing and abstract smoothing processing on the real scene image based on the semantic information of the real scene image to obtain an abstract cartoon image mapped from the real scene image on the cartoon domain includes: inputting the real scene image into the trained image Based on the extracted semantic information, the real scene image is reconstructed, and the reconstructed image content is subjected to bilateral filtering and smoothing to generate abstract cartoon images.

其中，深度神经网络，用于提取真实场景图像的语义信息，并基于该语义信息，进行图像重构处理以及双边滤波平滑处理。双边滤波平滑处理，是指从像素值和空间位置这两个维度，对图像进行滤波抽象平滑处理。Among them, the deep neural network is used to extract the semantic information of the real scene image, and based on the semantic information, image reconstruction processing and bilateral filtering and smoothing processing are performed. Bilateral filtering and smoothing refers to filtering and abstracting the image from the two dimensions of pixel value and spatial position.

需要说明的是，深度神经网络并不限于仅能够实现上述处理，还可以实现其他处理。It should be noted that the deep neural network is not limited to being able to implement only the above-mentioned processing, and can also implement other processing.

深度神经网络可以是一个单独的神经网络，也可以是多阶段神经网络。可以理解，多阶段神经网络中包括至少两个阶段的神经网络。A deep neural network can be a single neural network or a multi-stage neural network. It can be understood that the multi-stage neural network includes at least two-stage neural networks.

当深度神经网络为单独的神经网络时，计算机设备则可以预先根据样本数据集，单独地迭代训练该深度神经网络，得到最终的深度神经网络。当深度神经网络为多阶段神经网络时，计算机设备则可以将多阶段神经网络中包括的各阶段的神经网络一并进行迭代训练，从而实现对该多阶段深度神经网络的训练。When the deep neural network is a separate neural network, the computer equipment can individually iteratively train the deep neural network according to the sample data set in advance to obtain the final deep neural network. When the deep neural network is a multi-stage neural network, the computer equipment can iteratively train the neural networks of each stage included in the multi-stage neural network, so as to realize the training of the multi-stage deep neural network.

上述实施例中，通过深度神经网络，在第一阶段中，深入理解了真实场景图像中的关键语义信息，并基于真实场景图像的语义信息，对真实场景图像进行图像重构处理和抽象平滑处理，得到真实场景图像映射在卡通域上的抽象卡通图像，将图像重构和双边滤波平滑处理相结合，既保持了真实场景图像中的显著性结构，又能够平滑掉无关紧要的细节，提高了图像处理准确性。而且，双边滤波能够从像素值和空间位置这两个维度，对图像进行滤波抽象平滑处理，提高了抽象平滑处理的准确性和图像处理质量。后续，则能够基于第一阶段生成的抽象卡通图像，生成高质量的卡通图像。In the above embodiment, through the deep neural network, in the first stage, the key semantic information in the real scene image is deeply understood, and based on the semantic information of the real scene image, image reconstruction processing and abstract smoothing processing are performed on the real scene image. , get the abstract cartoon image of the real scene image mapped on the cartoon domain, and combine the image reconstruction and bilateral filtering smoothing, which not only maintains the saliency structure in the real scene image, but also smoothes out the insignificant details, which improves the Image processing accuracy. Moreover, bilateral filtering can perform filtering abstract smoothing processing on images from the two dimensions of pixel value and spatial position, which improves the accuracy of abstract smoothing processing and image processing quality. Subsequently, high-quality cartoon images can be generated based on the abstract cartoon images generated in the first stage.

在一个实施例中，真实场景图像被输入至深度神经网络中的抽象网络的第一生成器中。本实施例中，深度神经网络的训练步骤，包括：获取样本数据集；样本数据集中包括真实场景样本图的第一集合；将样本数据集输入待训练的深度神经网络中进行迭代训练，在每轮迭代中确定结构重建损失值和双边滤波平滑损失值，并根据所述结构重建损失值和双边滤波平滑损失值，确定目标损失值，在每轮迭代中，根据目标损失值，调整网络模型参数，直至停止迭代，得到训练完毕的深度神经网络；网络模型参数包括第一生成器的参数；训练完毕的深度神经网络中，包括训练完毕的抽象网络。In one embodiment, the real scene image is input into the first generator of the abstract network in the deep neural network. In this embodiment, the training steps of the deep neural network include: acquiring a sample data set; the sample data set includes a first set of sample graphs of real scenes; inputting the sample data set into the deep neural network to be trained for iterative training, and in each Determine the structural reconstruction loss value and the bilateral filtering smoothing loss value in the rounds of iterations, and determine the target loss value according to the structural reconstruction loss value and the bilateral filtering smoothing loss value, and in each iteration, adjust the network model parameters according to the target loss value , until the iteration is stopped, and the trained deep neural network is obtained; the network model parameters include the parameters of the first generator; the trained deep neural network includes the trained abstract network.

可以理解，本申请实施例中，深度神经网络中包括抽象网络，抽象网络中包括第一生成器，计算机设备是将真实场景图像输入该第一生成器中，以生成抽象卡通图像。即，抽象网络的第一生成器，用于提取并基于真实场景图像的语义信息，进行图像重构以及双边滤波平滑处理，从而生成抽象卡通图像。It can be understood that, in the embodiment of the present application, the deep neural network includes an abstract network, the abstract network includes a first generator, and the computer device inputs real scene images into the first generator to generate abstract cartoon images. That is, the first generator of the abstract network is used to extract and perform image reconstruction and bilateral filtering and smoothing processing based on the semantic information of the real scene image, thereby generating an abstract cartoon image.

抽象卡通图像可以是第一生成器的最终输出数据，也可以是第一生成器生成的中间结果，即第一生成器还可以基于该抽象卡通图像进行进一步地处理，然后再进行相应的输出。The abstract cartoon image may be the final output data of the first generator, or may be an intermediate result generated by the first generator, that is, the first generator may further process based on the abstract cartoon image, and then perform corresponding output.

该深度神经网络是预先训练的神经网络。下面将介绍用于训练该深度神经网络的样本数据集和具体训练步骤。The deep neural network is a pre-trained neural network. The sample dataset and specific training steps used to train the deep neural network are described below.

样本数据集，即为用于训练深度神经网络的训练集。样本数据集中包括真实场景样本图的第一集合，但并不限定只包括该第一集合，还可以包括其他样本数据的集合。即，样本数据集可以包括多种不同类型的样本集合，其中，第一集合即为一种类型的样本集合(即真实场景样本图的集合)。The sample data set is the training set used to train the deep neural network. The sample data set includes the first set of real scene sample graphs, but it is not limited to only include the first set, and may also include sets of other sample data. That is, the sample data set may include multiple different types of sample sets, wherein the first set is one type of sample set (ie, a set of real scene sample graphs).

在一个实施例中，第一集合中可以包括的是原始的真实场景样本图，也可以包括的是针对原始的真实场景样本图进行预处理后的真实场景样本图。In one embodiment, the first set may include original real scene sample images, or may include real scene sample images that have been preprocessed for the original real scene sample images.

具体地，计算机设备可以获取原始的真实场景样本图，然后，将原始的真实场景样本图，随机裁剪为预设尺寸的图像，然后，再将其调整为预设分辨率，将进行尺寸调整和分辨率调整后的图像，作为最终的真实场景样本图，进而形成第一集合，输入至深度神经网络中进行训练。Specifically, the computer device can obtain the original real scene sample image, and then randomly crop the original real scene sample image into an image of a preset size, and then adjust it to a preset resolution, and then resize and The image after resolution adjustment is used as the final real scene sample map, and then forms the first set, which is input to the deep neural network for training.

比如，先搜集5000张真实世界的照片(即为原始的真实场景样本图)。将5000张真实世界的照片随机裁剪成预设尺寸大小的小块(比如，大小为500×500)，再调整至分辨率为256×256，那么，调整后的图像即为第一集合——即真实场景样本图的集合。那么，可以将调整后的图像(即第一集合)，输入至深度学习网络中执行训练。可以理解，通过对原始的真实场景样本图进行尺寸调整和分辨率调整的预处理后，再生成第一集合，能够提高后续深度学习网络训练的准确性，而且也能够减少数据处理压力。For example, first collect 5,000 real-world photos (that is, the original real-world sample images). Randomly crop 5000 real-world photos into small pieces of preset size (for example, the size is 500×500), and then adjust the resolution to 256×256, then, the adjusted image is the first set—— That is, a collection of real scene sample graphs. Then, the adjusted images (ie, the first set) can be input into the deep learning network to perform training. It can be understood that by preprocessing the original real scene sample image with size adjustment and resolution adjustment, and then generating the first set, the accuracy of subsequent deep learning network training can be improved, and the data processing pressure can also be reduced.

目标损失值，是指在对深度神经网络进行迭代训练时，每轮迭代中的最终损失值。The target loss value refers to the final loss value in each iteration when the deep neural network is iteratively trained.

在一个实施例中，计算机设备可以获取预先针对抽象网络设计的结构重建损失函数和双边滤波平滑损失函数，进而根据结构重建损失函数和双边滤波平滑损失函数构建目标损失函数，并将样本数据集输入至待训练的深度神经网络中进行迭代训练。在每次迭代中调整深度神经网络的网络模型参数，以最小化目标损失函数的值，直至停止迭代，得到训练完毕的深度学习网络。In one embodiment, the computer device may obtain the structure reconstruction loss function and the bilateral filtering smoothing loss function that are pre-designed for the abstract network, and then construct the target loss function according to the structure reconstruction loss function and the bilateral filtering smoothing loss function, and input the sample data set as input Iterative training is performed in the deep neural network to be trained. The network model parameters of the deep neural network are adjusted in each iteration to minimize the value of the objective loss function, until the iteration is stopped, and the trained deep learning network is obtained.

可以理解，在每次迭代中，计算机设备可以根据目标损失函数中的结构重建损失函数和双边滤波平滑损失函数，计算结构重建损失值和双边滤波平滑损失值，并根据结构重建损失值和双边滤波平滑损失值，确定每次迭代对应的目标损失值，并根据目标损失值，调整深度神经网络的网络模型参数，以寻找最小化的目标损失值，从而使深度神经网络逐步趋于收敛，直至停止迭代，得到训练完毕的深度学习网络。It can be understood that in each iteration, the computer equipment can calculate the structural reconstruction loss value and the bilateral filtering smoothing loss value according to the structural reconstruction loss function and the bilateral filtering smoothing loss function in the target loss function, and according to the structural reconstruction loss value and bilateral filtering loss value Smooth the loss value, determine the target loss value corresponding to each iteration, and adjust the network model parameters of the deep neural network according to the target loss value to find the minimum target loss value, so that the deep neural network gradually converges until it stops Iterate to get the trained deep learning network.

需要说明的是，是根据结构重建损失函数进行有监督地迭代训练，以及根据双边滤波平滑损失函数进行无监督地迭代训练。即，利用有监督的结构重建损失和无监督双边滤波平滑损失进行图像内容重构和抽象平滑处理。It should be noted that the supervised iterative training is performed according to the structural reconstruction loss function, and the unsupervised iterative training is performed according to the bilateral filtering smoothing loss function. Namely, image content reconstruction and abstract smoothing are performed using a supervised structure reconstruction loss and an unsupervised bilateral filtering smoothing loss.

可以理解，由于深度神经网络中包括抽象网络，在对深度神经网络的训练过程中，必然会涉及对抽象网络的训练，在对网络模型参数进行调整时，必然涉及到对第一生成器的参数的调整。因此，训练完毕的深度神经网络中，包括训练完毕的抽象网络。It can be understood that since the deep neural network includes an abstract network, the training of the abstract network must be involved in the training process of the deep neural network. When adjusting the parameters of the network model, the parameters of the first generator must be involved. adjustment. Therefore, the trained deep neural network includes the trained abstract network.

需要说明的是，目标损失值并不限定于仅根据结构重建损失值和双边滤波平滑损失值确定得到。当深度神经网络还包括其他网络，或者还具有其他处理时，则还可以结合其他损失值共同确定得到目标损失值。It should be noted that the target loss value is not limited to be determined only according to the structural reconstruction loss value and the bilateral filtering smoothing loss value. When the deep neural network also includes other networks or has other processing, the target loss value can also be determined jointly with other loss values.

其中，结构重建损失值，用于表示第一生成器根据真实场景样本图生成的卡通图像，与真实场景样本图之间的结构特征差异。可以理解，在构建目标损失函数时考虑结构重建损失函数，是为了限制输入的真实场景图像与第一生成器输出的卡通图像之间的结构相似性，即，使第一生成器生成的卡通图像中，保留输入的真实场景样本图中的显著性结构(即保留关键结构特征)。Wherein, the structural reconstruction loss value is used to represent the structural feature difference between the cartoon image generated by the first generator according to the real scene sample image and the real scene sample image. It can be understood that the structural reconstruction loss function is considered when constructing the target loss function, in order to limit the structural similarity between the input real scene image and the cartoon image output by the first generator, that is, to make the cartoon image generated by the first generator. , preserve the saliency structure in the input sample map of the real scene (i.e. preserve key structural features).

在一个实施例中，针对抽象网络设计的结构重建损失函数如下：In one embodiment, the structural reconstruction loss function designed for the abstract network is as follows:

L_ssc＝∑(1+ρB)||G(I)-I||₂；L _ssc =∑(1+ρB)||G(I)-I|| ₂ ;

其中，L_ssc为结构重建损失函数；G(I)表示第一生成器生成的卡通图像(即预测的抽象卡通图像)。I表示输入的真实场景样本图。B表示二进制掩膜，二进制掩膜中1表示结构边缘，否则为0，ρ是显著边缘的权重。可以理解，二进制掩膜可以利用预训练的线跟踪网络来生成，即通过线追踪网络提取真实场景样本图的结构边缘线条，然后，将得到的线条图像转换为二进制掩膜。Among them, L _ssc is the structure reconstruction loss function; G(I) represents the cartoon image (ie the predicted abstract cartoon image) generated by the first generator. I represents the input real scene sample map. B represents a binary mask, in which 1 represents the structural edge, otherwise it is 0, and ρ is the weight of the salient edge. It can be understood that the binary mask can be generated by using a pre-trained line tracking network, that is, the structural edge lines of the sample image of the real scene are extracted through the line tracking network, and then the obtained line image is converted into a binary mask.

双边滤波平滑损失值，用于根据像素值相似性和空间位置相似性，确定第一生成器生成的卡通图像中相邻像素之间的差异。可以理解，在构建目标损失函数时考虑双边滤波平滑损失函数，是为了综合考虑像素值相似性和空间位置相似性，来对图像进行抽象平滑处理。即，从像素值和空间位置这两个维度，对图像进行滤波抽象平滑处理。Bilateral filtering smoothes the loss value to determine the difference between adjacent pixels in the cartoon image generated by the first generator based on pixel value similarity and spatial position similarity. It can be understood that the bilateral filtering smoothing loss function is considered when constructing the target loss function, in order to comprehensively consider the similarity of pixel values and the similarity of spatial positions to abstract and smooth the image. That is, from the two dimensions of pixel value and spatial position, the image is filtered abstractly smoothed.

具体地，计算机设备可以获取预先设计的双边滤波平滑损失函数。然后，在每次迭代处理中，可以结合双边滤波平滑损失函数，从像素值相似性和空间位置相似性这两方面进行综合考量，得到双边滤波平滑损失值。可以理解，该双边滤波平滑损失值，能够用于表征第一生成器生成的卡通图像中每个像素和其相邻像素之间的差异。这样一来，就相当于从像素值相似性和空间位置相似性这两方面进行综合考量，确定了第一生成器生成的卡通图像中每个像素和其相邻像素之间的差异。Specifically, the computer device may obtain a pre-designed bilateral filtering smoothing loss function. Then, in each iterative process, the bilateral filtering smoothing loss value can be obtained by comprehensively considering the two aspects of pixel value similarity and spatial position similarity in combination with the bilateral filtering smoothing loss function. It can be understood that the bilateral filtering smoothing loss value can be used to characterize the difference between each pixel and its adjacent pixels in the cartoon image generated by the first generator. In this way, the difference between each pixel and its adjacent pixels in the cartoon image generated by the first generator is determined by comprehensively considering the similarity of the pixel value and the similarity of the spatial position.

在一个实施例中，双边滤波平滑损失函数，包括空间内核和像素范围内核这两个内核函数。即，由空间内核和像素范围内核确定双边滤波平滑损失函数最终的权重系数函数。其中，空间内核，是空间域的权重系数函数，用于平滑像素的空间邻域的大小，即从空间位置相似性来确定权重系数。像素范围内核，是像素范围域的权重系数函数，用于平滑相邻像素之间的像素值(色差)大小，即从像素值相似性来确定权重系数。因此，双边滤波平滑损失函数最终的权重系数函数，是考虑了像素值相似性和空间位置相似性的综合性的权重系数函数。因此，在使用双边滤波平滑损失函数来计算双边滤波平滑损失值时，可以从像素值相似性和空间位置相似性这两个方面来综合考量，In one embodiment, the bilateral filtering smoothes the loss function, including two kernel functions, a spatial kernel and a pixel range kernel. That is, the final weight coefficient function of the bilateral filtering smoothing loss function is determined by the spatial kernel and the pixel range kernel. Among them, the spatial kernel is the weight coefficient function of the spatial domain, which is used to smooth the size of the spatial neighborhood of the pixels, that is, the weight coefficient is determined from the spatial position similarity. The pixel range kernel is the weight coefficient function of the pixel range domain, which is used to smooth the pixel value (color difference) size between adjacent pixels, that is, to determine the weight coefficient from the similarity of pixel values. Therefore, the final weight coefficient function of the bilateral filtering smoothing loss function is a comprehensive weight coefficient function that considers the similarity of pixel values and the similarity of spatial positions. Therefore, when using the bilateral filtering smoothing loss function to calculate the bilateral filtering smoothing loss value, it can be comprehensively considered from the two aspects of pixel value similarity and spatial position similarity.

在一个实施例中，双边滤波平滑损失函数L_fla的公式如下：In one embodiment, the formula of the bilateral filtering smoothing loss function _Lfla is as follows:

其中，L_fla为双边滤波平滑损失函数；N是像素总数，N_S(i)表示像素i的在H×H范围的相邻像素合集。j则表示像素i的一个相邻像素。|·|^z表示没有开平方的L2正则化操作。其中

定义为

表示空间域的权重系数函数；

定义为

表示像素范围域的权重系数函数。σ表示高斯核函数的标准差。σ_co表示空间内核函数的标准差，σ_sp表示像素范围内核的标准差。C表示颜色通道，x和y表示相素的空间位置坐标。x_i,y_i则为像素i的空间位置坐标，x_j,y_j则为像素j的空间位置坐标。I表示输入的真实场景样本图。I_i,c-I_j,c表示I中的像素i和其相邻像素j之间的像素值之差。G(I)_i表示第一生成器生成的卡通图像中的像素i，G(I)_j表示第一生成器生成的卡通图像中的像素j。Among them, L _fla is the smoothing loss function of bilateral filtering; N is the total number of pixels, and N _S (i) represents the set of adjacent pixels of pixel i in the H×H range. j represents an adjacent pixel of pixel i. |·| ^z means L2 regularization operation without square root. in

defined as

represents the weight coefficient function of the spatial domain;

defined as

Represents a weight coefficient function for the pixel range domain. σ represents the standard deviation of the Gaussian kernel function. σ _co represents the standard deviation of the spatial kernel function, and σ _sp represents the standard deviation of the pixel-wide kernel. C represents the color channel, and x and y represent the spatial position coordinates of the pixels. x _i , y _i are the spatial position coordinates of pixel i, and x _j , y _j are the spatial position coordinates of pixel j. I represents the input real scene sample map. I _i,c -I _j,c represents the difference in pixel value between pixel i in I and its neighbor pixel j. G(I) _i represents pixel i in the cartoon image generated by the first generator, and G(I) _j represents pixel j in the cartoon image generated by the first generator.

在一个实施例中，σ_co和σ_sp可以分别设为0.2和5。In one embodiment, σ _co and σ _sp may be set to 0.2 and 5, respectively.

上述实施例中，将有监督的结构重建损失和无监督的双边滤波平滑损失结合，从而进行图像内容重构和抽象平滑处理训练，以训练得到深度神经网络，后续则可以基于训练的深度神经网络，进行高质量地图像内容重构和抽象平滑处理，从而提高后续的图像生成质量。In the above embodiment, the supervised structural reconstruction loss and the unsupervised bilateral filtering smoothing loss are combined to perform image content reconstruction and abstract smoothing training to obtain a deep neural network, which can then be based on the trained deep neural network. , perform high-quality image content reconstruction and abstract smoothing, thereby improving the quality of subsequent image generation.

在一个实施例中，对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像包括：在第一阶段中，通过第一生成器对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像。In one embodiment, stylizing the abstract cartoon image to generate a style cartoon image with an artistic style includes: in a first stage, performing stylization processing on the abstract cartoon image by a first generator to generate an artistic style cartoon image. style cartoon image.

可以理解，第一生成器除了能够生成抽象卡通图像以外，还可以对抽象卡通图像进一步地进行风格化处理，生成具有艺术风格的风格卡通图像。It can be understood that, in addition to generating the abstract cartoon image, the first generator can further stylize the abstract cartoon image to generate a style cartoon image with artistic style.

需要说明的是，本实施例中，由于第一生成器能够进行风格化处理，所以，必然在训练过程中也进行了风格化相关训练。即，预先对抽象网络进行风格迁移训练，从而使其第一生成器能够进行风格化处理。It should be noted that, in this embodiment, since the first generator can perform stylization processing, stylization-related training must also be performed in the training process. That is, the abstract network is pre-trained for style transfer so that its first generator can be stylized.

在一个实施例中，抽象网络还包括第一判别器。本申请实施例中，抽象网络则为包括第一生成器和第一判别器的生成式对抗网络(GAN模型)。因此，深度神经网络的网络模型参数中，还包括第一判别器的参数。In one embodiment, the abstraction network further includes a first discriminator. In this embodiment of the present application, the abstract network is a generative adversarial network (GAN model) including a first generator and a first discriminator. Therefore, the network model parameters of the deep neural network also include the parameters of the first discriminator.

本实施例中，样本数据集还包括抽象卡通样本图的第二集合。其中，抽象卡通样本图，是缺失轮廓边缘线条的抽象图像。可以理解，第二集合中的抽象卡通样本图都具有同一种艺术风格。In this embodiment, the sample data set further includes a second set of abstract cartoon sample images. Among them, the abstract cartoon sample image is an abstract image with missing outline edge lines. Understandably, the abstract cartoon sample images in the second collection all have the same artistic style.

可以理解，第一判别器的目的，是将第一生成器生成的卡通图像，与缺失轮廓边缘线条的抽象卡通样本图的风格区分开。而第一生成器与第一判别器是对抗性，就需要迭代地对抗生成让第一判别器区分不出来的卡通图像，以使得生成的卡通图像与抽象卡通样本图的风格逐渐接近，从而实现将抽象卡通样本图的风格特征引入到第一生成器所生成的卡通图像中。It can be understood that the purpose of the first discriminator is to distinguish the cartoon image generated by the first generator from the style of the abstract cartoon sample image that lacks contour edge lines. The first generator and the first discriminator are antagonistic, so it is necessary to iteratively generate cartoon images that the first discriminator cannot distinguish, so that the styles of the generated cartoon images and the abstract cartoon sample images are gradually approached, so as to achieve The style features of the abstract cartoon sample images are introduced into the cartoon images generated by the first generator.

在一个实施例中，在每轮迭代中确定结构重建损失值和双边滤波平滑损失值，并根据所述结构重建损失值和双边滤波平滑损失值，确定目标损失值包括：在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值和风格增强损失值，并根据所述结构重建损失值、双边滤波平滑损失值和风格增强损失值，确定目标损失值。In one embodiment, the structural reconstruction loss value and the bilateral filtering smoothing loss value are determined in each round of iteration, and according to the structural reconstruction loss value and the bilateral filtering smoothing loss value, determining the target loss value includes: in each round of iteration, A structure reconstruction loss value, a bilateral filter smoothing loss value and a style enhancement loss value are determined, and a target loss value is determined according to the structure reconstruction loss value, the bilateral filter smoothing loss value and the style enhancement loss value.

具体地，计算机设备可以获取预先设计的风格增强损失函数，根据结构重建损失函数、双边滤波平滑损失函数以及风格增强损失函数，构建目标损失函数。可以理解，在构建目标损失函数时考虑风格增强损失函数，是为了让第一生成器生成的卡通图像具有抽象卡通样本图的风格，即，让第一生成器能够生成具有抽象卡通样本图的风格的卡通图像。Specifically, the computer device may obtain a pre-designed style enhancement loss function, and construct the target loss function according to the structural reconstruction loss function, the bilateral filtering smoothing loss function, and the style enhancement loss function. It can be understood that the purpose of considering the style enhancement loss function when constructing the target loss function is to make the cartoon images generated by the first generator have the style of abstract cartoon sample images, that is, to enable the first generator to generate the style of abstract cartoon sample images. cartoon image.

在训练深度神经网络时，计算机设备可以将包括第一集合和第二集合的样本数据集一并输入至深度神经网络中进行迭代训练，在每次迭代中，从第一集合中随机选取一张真实场景样本图，并从第二集合中随机选取预设数量(比如，5张)的抽象卡通样本图作为参照，结合目标损失函数，得到本次迭代所对应的结构重建损失值、所述双边滤波平滑损失值以及风格增强损失值，进而，根据结构重建损失值、双边滤波平滑损失值以及风格增强损失值确定最终的目标损失值，从而根据目标损失值，迭代调整网络模型参数，其中，网络模型参数包括第一生成器的参数和第一判别器的参数，直至迭代停止。同样地，并不限定目标损失值还可以包括其他损失值。即，目标损失函数还可以包括其他损失函数。When training the deep neural network, the computer equipment may input the sample data sets including the first set and the second set into the deep neural network for iterative training, and in each iteration, randomly select a sample data set from the first set Sample images of real scenes, and randomly select a preset number (for example, 5) of abstract cartoon sample images from the second set as a reference, and combine with the target loss function to obtain the structural reconstruction loss value corresponding to this iteration, the bilateral Filter the smoothing loss value and the style enhancement loss value, and then determine the final target loss value according to the structural reconstruction loss value, the bilateral filtering smoothing loss value and the style enhancement loss value, so as to iteratively adjust the network model parameters according to the target loss value. The model parameters include the parameters of the first generator and the parameters of the first discriminator until the iteration stops. Likewise, the target loss value is not limited and may also include other loss values. That is, the target loss function may also include other loss functions.

可以理解，风格增强损失值，是由第一判别器根据第一生成器生成的卡通图像输出的第一概率确定。即，在每次迭代中，可以将第一生成器在本次迭代生成的卡通图像输入至第一判别器，第一判别器可以根据输入的该卡通图像，输出一个概率值——即第一概率，进而根据该第一概率确定风格增强损失值。第一概率，用于表示第一生成器生成的卡通图像，属于抽象卡通样本图的风格的概率。可以理解，第一概率越大，说明第一生成器所生成的卡通图像与抽象卡通样本图的风格越接近，越难让第一判别器识别出来。It can be understood that the style enhancement loss value is determined by the first discriminator according to the first probability output by the cartoon image generated by the first generator. That is, in each iteration, the cartoon image generated by the first generator in this iteration can be input to the first discriminator, and the first discriminator can output a probability value according to the input cartoon image—that is, the first probability, and then determine the style enhancement loss value according to the first probability. The first probability is used to represent the probability that the cartoon image generated by the first generator belongs to the style of the abstract cartoon sample image. It can be understood that the larger the first probability, the closer the cartoon image generated by the first generator is to the abstract cartoon sample image, and the harder it is for the first discriminator to recognize.

在一个实施例中，第一判别器的对抗损失函数(L_sty-D)的公式如下：In one embodiment, the adversarial loss function (L _sty-D ) of the first discriminator is formulated as follows:

L_sty-D＝∑log D(A_y)+log(1-D(G(I)))；L _sty-D =∑log D(A _y )+log(1-D(G(I)));

第一生成器的风格增强损失函数L_sty的公式如下：The formula for the style enhancement loss function L _sty of the first generator is as follows:

其中，A_y＝{a_y|y＝1,…,Y}∈A。A即为第二集合，A_y即为第Y个抽象卡通样本图。I和G(I)分别表示输入的真实场景样本图和第一生成器生成的卡通图像。D为第一判别器；D()为第一判别器的输出结果；D(A_y)为将第Y个抽象卡通样本图A_y输入至第一判别器后的输出结果，D(G(I))即为将第一生成器生成的卡通图像，输入至第一判别器后输出的第一概率。Among them, A _y ={a _y |y=1,...,Y}∈A. A is the second set, and A _y is the Y-th abstract cartoon sample image. I and G(I) represent the input real scene sample graph and the cartoon image generated by the first generator, respectively. D is the first discriminator; D( ) is the output result of the first discriminator; D(A _y ) is the output result after the Y-th abstract cartoon sample image A _y is input to the first discriminator, D(G( 1)) is the first probability output after inputting the cartoon image generated by the first generator to the first discriminator.

需要说明的是，是通过有监督地方式确定结构重建损失值、通过无监督地方式，确定双边滤波平滑损失值和风格增强损失值，从而基于有监督地结构重建损失值和无监督地双边滤波平滑损失值，进行图像内容重构和抽象平滑处理，并利用无监督地方式生成式对抗损失进行风格化迁移，从而在第一阶段，生成抽象平滑的高质量风格化的卡通图像。It should be noted that the loss value of structural reconstruction is determined in a supervised way, and the loss value of bilateral filtering and style enhancement loss are determined in an unsupervised way, so as to reconstruct the loss value based on supervised structure and unsupervised bilateral filtering. The loss value is smoothed, image content reconstruction and abstract smoothing are performed, and stylized transfer is performed using generative adversarial loss in an unsupervised manner, thereby generating abstract smooth high-quality stylized cartoon images in the first stage.

在一个实施例中，第二集合的获取步骤包括：获取原始卡通图像的第三集合；根据预训练的线追踪网络，提取各原始卡通图像的轮廓边缘线条；根据提取的轮廓边缘线条和原始卡通图像，生成抽象卡通样本图，得到第二集合；抽象卡通样本图，是从原始卡通图像中消除轮廓边缘线条后的抽象图像。In one embodiment, the step of obtaining the second set includes: obtaining a third set of original cartoon images; extracting contour edge lines of each original cartoon image according to a pre-trained line tracing network; according to the extracted contour edge lines and the original cartoon image image, generate an abstract cartoon sample image, and obtain the second set; the abstract cartoon sample image is an abstract image after removing the outline edge lines from the original cartoon image.

其中，第三集合中的原始卡通图像具有同一种艺术风格。那么，第二集合中的抽象卡通样本图也具有与第三集合中的原始卡通图像相同的艺术风格。Among them, the original cartoon images in the third collection have the same artistic style. Well, the abstract cartoon sample images in the second set also have the same artistic style as the original cartoon images in the third set.

需要说明的是，可以根据需要，准备所需要的艺术风格的原始卡通图像，得到第三集合，从而基于第三集合，得到属于该艺术风格的抽象卡通样本图，作为第二集合进行训练，从而能够灵活地生成所需要的艺术风格的卡通图像。即根据不用艺术风格的训练数据，可以生成不同艺术风格的卡通图像。It should be noted that the original cartoon images of the required artistic style can be prepared as required to obtain a third set, and then based on the third set, abstract cartoon sample images belonging to the artistic style can be obtained, and used as the second set for training, thereby It can flexibly generate cartoon images in the desired artistic style. That is, according to the training data without artistic styles, cartoon images with different artistic styles can be generated.

比如，假设想要生成A风格，那么可以准备A风格的第三集合，从而得到A风格的第二集合进行训练，则训练完毕的第一生成器则可以将真实场景图像转换成A风格的卡通图像。假设想要生成B风格，那么可以准备B风格的第三集合，从而得到B风格的第二集合进行训练，则训练完毕的第一生成器则可以将真实场景图像转换成B风格的卡通图像。For example, if you want to generate A style, you can prepare the third set of A style, so as to get the second set of A style for training, then the trained first generator can convert real scene images into A style cartoons image. Assuming that you want to generate B style, you can prepare the third set of B style, so as to obtain the second set of B style for training, then the trained first generator can convert real scene images into B style cartoon images.

线追踪网络，是预先训练的用于提取边缘线条的深度神经网络。即，用于提取图像的显著性区域的边缘线条，而忽略非显著性区域的多余线条。Line tracing network is a pre-trained deep neural network for extracting edge lines. That is, it is used to extract the edge lines of saliency regions of the image, while ignoring redundant lines of non-salient regions.

在一个实施例中，线追踪网络，可以是一个全卷积的深度神经网络，包括一个编码器和一个解码器。In one embodiment, the line tracking network, which may be a fully convolutional deep neural network, includes an encoder and a decoder.

具体地，计算机中可以预先训练线追踪网络，然后，将预设的原始卡通图像的第三集合，输入至线追踪网络中，由线追踪网络提取第三集合中各原始卡通图像的轮廓边缘线条。计算机设备可以将该线条图像转换为二进制掩膜。该二进制掩膜中所有边缘像素均指定为1，否则分配为0。进一步地，计算机设备可以使用高斯滤波器(比如标准层为5的高斯滤波器)，来生成高斯模糊版本的卡通图像。然后，计算机设备可以将高斯模糊版本的卡通图像与二进制掩膜相乘得到模糊边缘区域。计算机设备可以用该模糊边缘区域替换掉原始卡通图像中对应的边缘区域，从而得到边缘线条缺失的抽象卡通样本图，从而得到第二集合。可以理解，原始卡通图像和其所转换生成的抽象卡通样本图一一对应。Specifically, a line tracing network can be pre-trained in the computer, and then, a third set of preset original cartoon images is input into the line tracing network, and the line tracing network extracts the contour edge lines of each original cartoon image in the third set . Computer equipment can convert this line image into a binary mask. All edge pixels in this binary mask are assigned 1, otherwise 0. Further, the computer device may use a Gaussian filter (such as a Gaussian filter with a standard layer of 5) to generate a Gaussian blurred version of the cartoon image. The computer device can then multiply the Gaussian blurred version of the cartoon image with the binary mask to get blurred edge regions. The computer device can replace the corresponding edge region in the original cartoon image with the fuzzy edge region, so as to obtain an abstract cartoon sample image with missing edge lines, thereby obtaining the second set. It can be understood that there is a one-to-one correspondence between the original cartoon image and the converted abstract cartoon sample image.

在一个实施例中，线追踪网络的训练步骤包括：获取线追踪数据集；线追踪数据集中包括多组卡通线条图像对，每组卡通线条图像对，包括样本卡通图像和各样本卡通图像的线条图；根据线追踪数据集，有监督地迭代训练线追踪网络，得到最终的线追踪网络。In one embodiment, the training step of the line tracking network includes: acquiring a line tracking data set; the line tracking data set includes multiple sets of cartoon line image pairs, and each set of cartoon line image pairs includes a sample cartoon image and a line of each sample cartoon image Figure; According to the line tracking dataset, supervised iterative training of the line tracking network to obtain the final line tracking network.

可以理解，即可以利用L1范数损失函数有监督地训练线追踪网络。为了增强数据的多样性和对背景区域的耐受性，线追踪数据集中可以包括一些背景杂乱的图像，使得线追踪网络着重于重要的结构边缘。改线追踪网络，可以提取多种风格的卡通图像的轮廓边缘线条。It can be understood that the line tracing network can be supervised using the L1 norm loss function. To enhance data diversity and tolerance to background regions, some images with cluttered backgrounds can be included in the line-tracing dataset, allowing the line-tracing network to focus on important structural edges. By changing the line tracing network, the outline and edge lines of cartoon images of various styles can be extracted.

上述实施例中，通过预训练的线追踪网络，能够准确地提取各原始卡通图像的轮廓边缘线条，从而能够根据提取的轮廓边缘线条和原始卡通图像，生成更有参考价值的抽象卡通样本图。那么，基于该抽象卡通样本图能够训练出更为准确地深度神经网络，进而能够基于该深度神经网络生成质量更高的卡通图像。In the above embodiment, through the pre-trained line tracing network, the contour edge lines of each original cartoon image can be accurately extracted, so that more reference-valued abstract cartoon sample images can be generated according to the extracted contour edge lines and the original cartoon image. Then, based on the abstract cartoon sample image, a more accurate deep neural network can be trained, and then a higher quality cartoon image can be generated based on the deep neural network.

在一个实施例中，深度学习网络为两阶段神经网络，抽象网络为两阶段神经网络中的第一阶段抽象网络；两阶段神经网络中还包括第二阶段线条描绘网络。本实施例中，步骤206生成风格卡通图像的轮廓边缘线条，得到真实场景图像卡通化后的卡通图像包括：在第二阶段中，将在第一阶段生成的风格卡通图像，输入至已训练的两阶段神经网络中第二阶段线条描绘网络中的第二生成器，以通过第二生成器对风格卡通图像进行轮廓边缘线条生成处理，得到真实场景图像卡通化后的卡通图像；其中，第二阶段线条描绘网络，是在第二阶段中，生成轮廓边缘线条的深度神经网络。In one embodiment, the deep learning network is a two-stage neural network, and the abstract network is a first-stage abstract network in the two-stage neural network; the two-stage neural network further includes a second-stage line drawing network. In this embodiment, step 206 generates the outline edge lines of the style cartoon image, and obtaining the cartoon image after the real scene image is cartoonized includes: in the second stage, inputting the style cartoon image generated in the first stage into the trained A second generator in the second-stage line drawing network in the two-stage neural network, so as to perform contour edge line generation processing on the style cartoon image through the second generator, so as to obtain a cartoon image after the cartoon image of the real scene image; wherein, the second The stage line drawing network is a deep neural network that generates contour edge lines in the second stage.

其中，两阶段神经网络中包括两阶段的神经网络。可以理解，不同阶段的神经网络用于进行不同的处理，且前一阶段的神经网络的输出结果作为后一阶段的神经网络的输入数据。需要说明的是，在对两阶段神经网络的训练过程中，不同阶段输入的可以是样本数据集中不同或相同类型的样本集合。Among them, the two-stage neural network includes a two-stage neural network. It can be understood that the neural networks in different stages are used for different processing, and the output result of the neural network in the previous stage is used as the input data of the neural network in the subsequent stage. It should be noted that, in the training process of the two-stage neural network, the input of different stages may be different or the same type of sample sets in the sample data set.

两阶段神经网络，包括第一阶段抽象网络和第二阶段线条描绘网络。第一阶段抽象网络，是指在第一阶段进行抽象平滑处理以及风格化处理的深度神经网络。第二阶段线条描绘网络，即为第一阶段输出的图像描述轮廓边缘线条的深度神经网络。即，将第一阶段输出的图像，输入至第二阶段线条描绘网络中的第二生成器，以通过第二生成器对风格卡通图像进行轮廓边缘线条生成处理，从而输出真实场景图像卡通化后的最终的卡通图像。Two-stage neural network, including a first-stage abstraction network and a second-stage line drawing network. The first-stage abstract network refers to a deep neural network that performs abstract smoothing and stylization processing in the first stage. The second-stage line delineation network is a deep neural network that describes the contour edge lines for the images output in the first stage. That is, the image output in the first stage is input to the second generator in the line drawing network in the second stage, so that the outline edge line generation processing is performed on the style cartoon image by the second generator, so as to output the real scene image after cartoonization of the final cartoon image.

上述实施例中，基于两阶段神经网络，将图像抽象平滑及风格化(即在第一阶段进行)，与边缘生成处理(即在第二阶段进行)分阶段进行，为第一阶段输出的结果添加轮廓边缘线条，避免了在需要平滑的区域催生线条，达到了线条感和图像内容平滑的和谐统一。In the above embodiment, based on a two-stage neural network, the image is abstracted, smoothed and stylized (that is, performed in the first stage), and the edge generation processing (that is, performed in the second stage) is performed in stages, which is the output of the first stage. The addition of contour edge lines avoids the generation of lines in areas that need to be smoothed, and achieves a harmonious unity between the sense of line and the smoothness of image content.

在一个实施例中，抽象卡通样本图，是从原始卡通图像中消除轮廓边缘线条的抽象图像；样本数据集还包括与抽象卡通样本图对应的原始卡通图像的第三集合。本实施例中，在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值和风格增强损失值，并根据所述结构重建损失值、双边滤波平滑损失值和风格增强损失值，确定目标损失值包括：在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值、风格增强损失值和边缘线条分配损失值，并根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值和边缘线条分配损失值，确定目标损失值。In one embodiment, the abstract cartoon sample image is an abstract image in which outline edge lines are eliminated from the original cartoon image; the sample data set further includes a third set of original cartoon images corresponding to the abstract cartoon sample image. In this embodiment, in each iteration, the structural reconstruction loss value, the bilateral filtering smoothing loss value, and the style enhancement loss value are determined, and the target is determined according to the structural reconstruction loss value, the bilateral filtering smoothing loss value, and the style enhancement loss value. The loss values include: in each iteration, determine the structure reconstruction loss value, bilateral filter smoothing loss value, style enhancement loss value and edge line distribution loss value, and according to the structure reconstruction loss value, bilateral filter smoothing loss value, style enhancement value The loss value and the edge line assign the loss value to determine the target loss value.

可以理解，计算机设备可以获取预先设计的边缘线条分配损失函数。即，目标损失函数中还包括边缘线条分配损失函数。It can be understood that the computer device can obtain a pre-designed edge line assignment loss function. That is, the objective loss function also includes the edge line assignment loss function.

具体地，在每轮迭代训练中，计算机设备可以将抽象卡通样本图输入至待训练的线条描述网络的第二生成器中，以对线条描绘网络进行有监督地迭代训练，在每次迭代中，根据边缘线条分配损失函数，确定在将抽象卡通样本图输入至第二生成器后，由第二生成器生成的具有线条的图像与该抽象卡通样本图所对应的原始卡通图像之间的差异，从而得到边缘线条分配损失值。即，边缘线条分配损失值，用于表示以抽象卡通样本图作为第二生成器的输入时所生成的具有线条的图像，与抽象卡通样本图所对应的原始卡通图像之间的差异。Specifically, in each round of iterative training, the computer device may input the abstract cartoon sample image into the second generator of the line description network to be trained, so as to perform supervised iterative training on the line description network, in each iteration , according to the edge line distribution loss function, after the abstract cartoon sample image is input to the second generator, the difference between the image with lines generated by the second generator and the original cartoon image corresponding to the abstract cartoon sample image is determined , so as to get the edge line distribution loss value. That is, the edge line assignment loss value is used to represent the difference between the image with lines generated when the abstract cartoon sample image is used as the input of the second generator, and the original cartoon image corresponding to the abstract cartoon sample image.

计算机设备可以根据结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值，确定本轮迭代所对应的目标损失值。从而根据目标损失值调整两阶段神经网络的网络模型参数。网络模型参数还包括第一生成器、第一判别器以及第二生成器各自对应的参数。The computer device may determine the target loss value corresponding to the current iteration according to the loss value of structural reconstruction, the loss value of bilateral filtering, the loss value of style enhancement, and the loss value of edge line distribution. Thereby, the network model parameters of the two-stage neural network are adjusted according to the target loss value. The network model parameters further include parameters corresponding to the first generator, the first discriminator, and the second generator.

在一个实施例中，边缘线条分配损失函数L_eas的公式如下：In one embodiment, the formula of the edge line assignment loss function _Leas is as follows:

其中，L(A_y)代表以抽象卡通样本图A_y作为输入后，由第二生成器生成的具有线条的图像；C_GT表示抽象卡通样本图A_y所对应的原始卡通图像。Wherein, L(A _y ) represents the image with lines generated by the second generator after taking the abstract cartoon sample image A _y as input; C _GT represents the original cartoon image corresponding to the abstract cartoon sample image A _y .

需要说明的是，只有在训练阶段，才会涉及对抽象卡通样本图进行线条描绘，从而进行有监督的迭代训练。在线条描绘网络训练完毕进行使用时，只会将第一阶段中第一生成器输出的图像作为输入，而并不存在对抽象卡通样本图进行线条描绘。It should be noted that only in the training phase, will it involve the line drawing of abstract cartoon sample images, so as to conduct supervised iterative training. When the line drawing network is trained and used, only the image output by the first generator in the first stage is used as input, and there is no line drawing for the abstract cartoon sample image.

上述实施例中，在第二阶段进行有监督地线条描绘训练，能够准确地训练出用于生成轮廓边缘线条的第二阶段线条描绘网络，从而在使用网络模型时，能够基于该第二阶段线条描绘网络对第一阶段输出的结果描绘准确地轮廓边缘线条，实现了卡通图像的线条感，且避免催生杂乱多余的线条。In the above embodiment, supervised line drawing training is performed in the second stage, and the second stage line drawing network for generating contour edge lines can be accurately trained, so that when the network model is used, it can be based on the second stage line. The result of the first stage output by the delineation network accurately outlines the edge lines, realizes the line feeling of the cartoon image, and avoids the generation of cluttered and redundant lines.

在一个实施例中，第二阶段线条描绘网络还包括第二判别器；网络模型参数还包括第二判别器的参数。In one embodiment, the second-stage line drawing network further includes a second discriminator; the network model parameters further include parameters of the second discriminator.

可以理解，本实施例中，第二阶段线条描绘网络是包括第二生成器和第二判别器的生成式对抗网络。那么，两阶段神经网络的网络模型参数则还包括第二判别器的参数。It can be understood that, in this embodiment, the second-stage line drawing network is a generative adversarial network including a second generator and a second discriminator. Then, the network model parameters of the two-stage neural network also include the parameters of the second discriminator.

第二判别器的目的，是将第二生成器生成的具有线条的图像，与原始卡通图像的线条强度区分开。而第二生成器与第二判别器是对抗性，就需要迭代地对抗生成让第二判别器区分不出来的具有线条的图像，以使得生成的图像与相应原始卡通图像的线条强度逐渐接近，从而实现对生成的线条的增强。The purpose of the second discriminator is to distinguish the image with lines generated by the second generator from the line intensity of the original cartoon image. While the second generator and the second discriminator are antagonistic, it is necessary to iteratively generate an image with lines that the second discriminator cannot distinguish, so that the line intensity of the generated image and the corresponding original cartoon image gradually approaches, This enables enhancements to the generated lines.

可以理解，第二生成器生成的具有线条的图像，以及确实轮廓边缘线条的抽象卡通样本图，皆为假样本数据，而原始卡通图像为真实样本数据。从而引导第二生成器生成让第二判别器无法区分的增强线条后的图像。It can be understood that the image with lines generated by the second generator and the abstract cartoon sample image with real outline and edge lines are all fake sample data, while the original cartoon image is real sample data. Thereby, the second generator is guided to generate a line-enhanced image that is indistinguishable by the second discriminator.

其中，第二生成器在与第二判别器进行对抗训练时，是将第一生成器输出的图像输入至第二生成器中，进行无监督地迭代训练。Wherein, when the second generator performs adversarial training with the second discriminator, the images output by the first generator are input into the second generator for unsupervised iterative training.

在一个实施例中，在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值、风格增强损失值和边缘线条分配损失值，并根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值和边缘线条分配损失值，确定目标损失值，包括：在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值、风格增强损失值、边缘线条分配损失值和边缘线条增强损失值；根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值、边缘线条分配损失值和所述边缘线条增强损失值，确定目标损失值。In one embodiment, in each iteration, a structure reconstruction loss value, a bilateral filter smoothing loss value, a style enhancement loss value, and an edge line assignment loss value are determined, and according to the structure reconstruction loss value, bilateral filter smoothing loss value, The style enhancement loss value and the edge line distribution loss value determine the target loss value, including: in each iteration, determine the structure reconstruction loss value, the bilateral filter smoothing loss value, the style enhancement loss value, the edge line distribution loss value and the edge line enhancement value Loss value; determine the target loss value according to the structural reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, the edge line distribution loss value and the edge line enhancement loss value.

具体地，计算机设备可以获取预先设计的边缘线条增强损失函数，根据结构重建损失函数、双边滤波平滑损失函数、风格增强损失函数、边缘线条分配损失函数以及边缘线条增强损失函数，构建目标损失函数。可以理解，在构建目标损失函数时考虑边缘线条增强损失函数，是为了让第二生成器生成的图像的线条的强度接近于原始卡通图像的线条强度。Specifically, the computer device can obtain the pre-designed edge line enhancement loss function, and construct the target loss function according to the structure reconstruction loss function, the bilateral filtering smoothing loss function, the style enhancement loss function, the edge line assignment loss function, and the edge line enhancement loss function. It can be understood that the edge line enhancement loss function is considered when constructing the target loss function, so that the line intensity of the image generated by the second generator is close to the line intensity of the original cartoon image.

那么，在训练两阶段神经网络时，将样本数据集输入至两阶段神经网络中进行迭代训练。在每次迭代中，计算机设备可以结合目标损失函数，得到本次迭代所对应的结构重建损失值、所述双边滤波平滑损失值、风格增强损失值、边缘线条分配损失值以及边缘线条增强损失值，进而得到最终的目标损失值，从而根据目标损失值，迭代调整网络模型参数，直至迭代停止。Then, when training the two-stage neural network, the sample dataset is input into the two-stage neural network for iterative training. In each iteration, the computer device can combine the target loss function to obtain the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, the edge line assignment loss value and the edge line enhancement loss value corresponding to the current iteration , and then obtain the final target loss value, so as to iteratively adjust the network model parameters according to the target loss value until the iteration stops.

可以理解，边缘线条增强损失值，是将第二生成器以第一生成器所生成的图像作为输入时所生成的具有线条的图像，输入至第二判别器后，由第二判别器输出的第二概率确定；第二概率，用于表征第二生成器生成的具有线条的图像与原始卡通图像之间的线条强度差异。即，在每次迭代中，可以将第一生成器所生成的图像输入至第二生成器，并将第二生成器根据输入的该图像生成的具有线条的图像，输入至第二判别器，由第二判别器输出一个概率值——即第二概率。进而根据该第二概率确定边缘线条增强损失值。可以理解，第二概率越大，说明第二生成器所生成的图像与原始卡通图像的线条强度越接近，越难让第二判别器识别出来，从而实现对轮廓边缘线条的增强处理。It can be understood that the loss value of edge line enhancement is the image with lines generated when the second generator takes the image generated by the first generator as input, and after inputting it to the second discriminator, the output of the second discriminator is The second probability is determined; the second probability is used to characterize the line intensity difference between the image with lines generated by the second generator and the original cartoon image. That is, in each iteration, the image generated by the first generator can be input to the second generator, and the image with lines generated by the second generator according to the input image can be input to the second discriminator, The second discriminator outputs a probability value—that is, the second probability. Further, the edge line enhancement loss value is determined according to the second probability. It can be understood that the larger the second probability, the closer the line intensity of the image generated by the second generator is to the original cartoon image, and the more difficult it is for the second discriminator to recognize, so as to realize the enhancement processing of contour edge lines.

在一个实施例中，边缘增强对抗损失函数(L_eau-D)的公式如下：In one embodiment, the edge enhancement adversarial loss function ( _Leau-D ) is formulated as follows:

在一个实施例中，生成器对应的边缘线条增强损失函数L_eau的公式如下：In one embodiment, the formula of the edge line enhancement loss function L _eau corresponding to the generator is as follows:

其中，A_y为抽象卡通样本图，C_GT表示抽象卡通样本图A_y所对应的原始卡通图像；D_L表示第二判别器，D_L()表示第二判别器输出的结果,G(I)和L(G(I))分别指示第一阶段的第一生成器生成的图像和第二阶段的第二生成器生成的图像。D_L(L(G(I)))表示将第二生成器生成的图像输入至第二判别器后，输出的第二概率。D_L(C_GT)表示将原始卡通图像输入至第二判别器后输出的结果；D_L(A_y)表示将抽象卡通样本图A_y输入至第二判别器后输出的结果。Among them, A _y is the abstract cartoon sample image, C _GT represents the original cartoon image corresponding to the abstract cartoon sample image A _y ; _DL represents the second discriminator, _DL ( ) represents the result output by the second discriminator, G(I ) and L(G(I)) denote the images generated by the first generator of the first stage and the images generated by the second generator of the second stage, respectively. D _L (L(G(I))) represents the second probability output after the image generated by the second generator is input to the second discriminator. DL ( _C _GT ) represents the output result after inputting the original cartoon image to the second discriminator; DL _{(A y ) represents the output result after inputting the abstract cartoon sample image A y} _to _the second discriminator.

在一个实施例中，最终的目标损失函数L_all的公式如下：In one embodiment, the formula of the final objective loss function L _all is as follows:

其中，L_ssc为结构重建损失函数，L_sty为风格增强损失函数，L_fla为双边滤波平滑损失函数，L_eas为边缘线条分配损失函数，L_eau为边缘线条增强损失函数。α，β，γ，δ，θ为参数。在一个实施例中，α，β，γ，δ，θ可以分别启发设置为10、2、1.5、50、1。Among them, L _ssc is the structure reconstruction loss function, L _sty is the style enhancement loss function, L _fla is the bilateral filtering smoothing loss function, L _eas is the edge line allocation loss function, and L _eau is the edge line enhancement loss function. α, β, γ, δ, θ are parameters. In one embodiment, α, β, γ, δ, θ can be heuristically set to 10, 2, 1.5, 50, and 1, respectively.

即，在第二阶段，根据有监督地方式确定边缘线条分配损失值，以及无监督地方式确定边缘线条增强损失值，从而在第二阶段通过有监督地边缘线条分配和无监督地边缘线条增强训练，在第一阶段生成的卡通图像的基础上绘制边缘轮廓线条，减少了不必要线条的催生，提高了最终的卡通图像的质量。That is, in the second stage, the edge line assignment loss value is determined in a supervised manner, and the edge line enhancement loss value is determined in an unsupervised manner, thereby in the second stage by supervised edge line assignment and unsupervised edge line enhancement During training, the edge contour lines are drawn on the basis of the cartoon images generated in the first stage, which reduces the generation of unnecessary lines and improves the quality of the final cartoon images.

为了便于理解两阶段神经网络，现结合图3对两阶段神经网络的网络结构进行示意说明。两阶段神经网络包括第一阶段抽象网络和第二阶段线条描绘网络。图3中主要示意第一生成器和第二生成器的结构，并未示出第一判别器和第二判别器的结构。In order to facilitate the understanding of the two-stage neural network, a schematic illustration of the network structure of the two-stage neural network is now given with reference to FIG. 3 . The two-stage neural network includes a first-stage abstraction network and a second-stage line drawing network. FIG. 3 mainly shows the structures of the first generator and the second generator, and does not show the structures of the first discriminator and the second discriminator.

参照图3，第一阶段抽象网络中的第一生成器G是具有编码器E₁和解码器D₁结构的全卷积网络。编码器E₁和解码器D₁可以由两个卷积块依次层叠。每个卷积块都包含一个内核大小为3×3的卷积层，一个实例规范化层和ReLu(线性整流函数)激活层。在编码卷积层中提取丰富的特征，并且在解码期间将分辨率放大到原始分辨率。可以理解，每个卷积块的细节结构并未示出。在网络中间，添加了若干个残差块R₁(比如，9个)以补充有用的功能并消除伪像。Referring to Fig. 3, the first generator G in the first stage abstraction network is a fully convolutional network with the structure of encoder E ₁ and decoder D ₁ . The encoder E ₁ and the decoder D ₁ may be sequentially stacked by two convolutional blocks. Each convolution block contains a convolutional layer with a kernel size of 3 × 3, an instance normalization layer, and a ReLu (linear rectification function) activation layer. Rich features are extracted in the encoding convolutional layers, and the resolution is upscaled to the original resolution during decoding. It can be understood that the detailed structure of each convolution block is not shown. In the middle of the network, several residual blocks R1 (eg, ₉ ) are added to supplement useful functions and eliminate artifacts.

进一步的，为充分利用原始输入，可以在编码器E1前面可以设置一个单步的平移块(Flat Conv)F_1e平移块F_1e可以使用7×7大小的64个卷积内核进行设置，从而扩展接收域并重新整合特征以学习更多的全局表示。在解码器D₁之后可以设置一个平坦的转换块F_1d，以将解码器D₁输出的特征转换为三通道输出。通过反射填充Reflection ping减少周围图像边界处的伪影。在第一生成器G的末端，采用了从输入到输出的跳过残留连接，而Tanh(双曲函数中的一个)激活层T紧随其后，以获取第一生成器最终生成的抽象结果——即生成的抽象的卡通图像。Further, in order to make full use of the original input, a single-step translation block (Flat Conv) can be set in front of the encoder E1. The translation block F _1e can be _set with 64 convolution kernels of 7 × 7 size, thereby expanding Receptive fields and reintegrate features to learn more global representations. A flat transform block F _1d can be set after the decoder D ₁ to transform the features output by the decoder D ₁ into a three-channel output. Reduces artifacts at surrounding image boundaries with Reflection ping. At the end of the first generator G, a skip residual connection from input to output is adopted, followed by a Tanh (one of the hyperbolic function) activation layer T to obtain the abstract result finally generated by the first generator - that is, the generated abstract cartoon image.

第一阶段抽象网络中的第一判别器，由三个具有4×4内核大小的卷积块组成。在每个卷积块的归一化层之后，使用Leaky ReLU(带泄露修正线性单元函数)作为激活函数。通过使用卷积核将输出特征转换为二进制值，可以获得最终的分类结果。可以理解，对于第一判别器来说，相当于应用块级别patch-level的结构来专注于局部特征学习。图中并未示出第一判别器的结构。The first discriminator in the first-stage abstraction network, consisting of three convolutional blocks with a kernel size of 4 × 4. After the normalization layer of each convolutional block, Leaky ReLU (linear unit function with leaky correction) is used as the activation function. The final classification result can be obtained by using a convolution kernel to convert the output features into binary values. It can be understood that for the first discriminator, it is equivalent to apply a block-level patch-level structure to focus on local feature learning. The figure does not show the structure of the first discriminator.

第二阶段线条描绘网络的结构组成，与第一阶段抽象网络的结构组成相似。第二判别器的结构采用第一判别器的结构(图3中同样未示出第二判别器的结构)，第二生成器L采用与第一生成器G相似的结构。The lines in the second stage depict the structural composition of the network, which is similar to the structural composition of the abstract network in the first stage. The structure of the second discriminator adopts the structure of the first discriminator (the structure of the second discriminator is also not shown in FIG. 3 ), and the second generator L adopts a structure similar to that of the first generator G.

从图3中可知，第二生成器L也包括编码器E₂和解码器D₂，在解码器D₂之前和编码器E₂之后，分别有平坦块F_2d和F_2e，编码器E₂和解码器D₂之间有若干个残差块R₂。第二生成器L与第一生成器G有两处不同。第一个不同在于，第二生成器使用的残差块的数量小于第一生成器中的残差块的数量(比如，第一生成器使用9个残差块，则第二生成器可以使用6个残差块)。因为边缘线条信息比颜色风格所包含信息更简单，浅层网络更适合，所以，第二生成器就可以使用较少数量的残差块。第二个不同在于，第二生成器相较于第一生成器不采用从输入到输出的跳连接。需要说明的是，图3中仅示意出第二生成器L的网络结构，并不是说将第一阶段的输出结果G(I)输入至F_2d，然后由F_2e输出最终的结果，而是将第一阶段的输出结果G(I)先经过第二生成器L中的编码器，在编码后再经过第二生成器L的解码器，最终输出得到L(G(I))，即依次经历F_2e—>E₂—>R₂—>D₂—>F_2d，从而输出L(G(I))。It can be seen from FIG. 3 that the second generator L also includes an encoder E ₂ and a decoder D ₂ . Before the decoder D ₂ and after the encoder E ₂ , there are flat blocks F _2d and F _2e respectively, and the encoder E ₂ There are several residual blocks R _{2 between it and the decoder D 2} _. The second generator L differs from the first generator G in two ways. The first difference is that the number of residual blocks used by the second generator is smaller than the number of residual blocks in the first generator (eg, if the first generator uses 9 residual blocks, the second generator can use 6 residual blocks). Because the edge line information is simpler than the information contained in the color style, shallow networks are more suitable, so the second generator can use a smaller number of residual blocks. The second difference is that the second generator does not employ jump connections from input to output compared to the first generator. It should be noted that FIG. 3 only shows the network structure of the second generator L, which does not mean that the output result G(I) of the first stage is input to F _2d , and then F _2e outputs the final result, but The output result G(I) of the first stage is first passed through the encoder in the second generator L, and then passed through the decoder of the second generator L after encoding, and the final output is L(G(I)), that is, sequentially Go through F _2e —>E ₂ —>R ₂ —>D ₂ —>F _2d , thereby outputting L(G(I)).

现以图3中示出的网络结构，描述对真实场景卡通化处理的简要流程。With the network structure shown in FIG. 3 , a brief flow of the cartoonization processing of the real scene will now be described.

先介绍两阶段网络的训练过程中的简要流程。在训练过程中，样本数据集则包括真实场景样本图的第一集合P、抽象卡通样本图的第二集合A以及与各抽象卡通样本图对应的原始卡通图像的第三集合C。那么，在每次迭代训练中，将第一集合输入待训练的第一阶段抽象网络中，由第一生成器G进行图像重构处理和双边滤波平滑处理，输出抽象的卡通图像G(I)。其中，在训练阶段，I即表示第一集合P中的一张真实场景样本图。从而可以根据卡通图像G(I)与真实场景样本图I之间的结构差异，确定本次训练的结构重建损失值，并从像素值相似性和空间位置相似性方面考量，确定卡通图像G(I)中相邻像素之间的差异，从而得到双边滤波平滑损失值。First, the brief flow of the training process of the two-stage network is introduced. In the training process, the sample data set includes a first set P of real scene sample images, a second set A of abstract cartoon sample images, and a third set C of original cartoon images corresponding to each abstract cartoon sample image. Then, in each iterative training, the first set is input into the first-stage abstract network to be trained, and the first generator G performs image reconstruction processing and bilateral filtering and smoothing processing, and outputs an abstract cartoon image G(I) . Among them, in the training phase, I represents a real scene sample map in the first set P. Therefore, the structural reconstruction loss value of this training can be determined according to the structural difference between the cartoon image G(I) and the real scene sample image I, and the cartoon image G( The difference between adjacent pixels in I), resulting in a bilateral filtering smoothing loss value.

第一判别器可以从第二集合A中随机选择预设数量的抽象卡通样本图作为参考，来判别第一生成器G输出卡通图像G(I)属于抽象卡通样本图的风格的概率(即第一概率)，从而计算得到风格增强损失值。图3中示出的是5张抽象卡通样本图(如虚线框302中所示出的5张图像)。The first discriminator can randomly select a preset number of abstract cartoon sample images from the second set A as a reference to determine the probability that the output cartoon image G(I) of the first generator G belongs to the style of the abstract cartoon sample image (that is, the first a probability), so as to calculate the style enhancement loss value. Shown in Figure 3 are 5 abstract cartoon sample images (such as the 5 images shown in dotted box 302).

在每次迭代训练中，针对第二阶段线条描绘网络的训练中，可以从将第二集合中的一个抽象卡通样本图(比如，图3中示出的A_y)作为输入，并从第三集合中找到与抽象卡通样本图A_y对应的原始卡通图像C_GT作为参考，根据抽象卡通样本图A_y和其对应的原始卡通图像C_GT来进行有监督地训练，得到边缘线条分配损失值。In each iteration of training, the training of the line drawing network for the second stage can start with an abstract cartoon sample image in the second set (eg, _Ay shown in Figure 3) as input, and from the third The original cartoon image C _GT corresponding to the abstract cartoon sample image A _y is found in the collection as a reference, and supervised training is performed according to the abstract cartoon sample image A _y and its corresponding original cartoon image C _GT , and the loss value of edge line assignment is obtained.

此外，在每次迭代训练中，还可以将在第一阶段输出的G(I)输入至第二阶段线条描绘网络中，由第二生成器为其描绘线条，从而生成带有线条的卡通图像L(G(I))。将第二生成器生成的卡通图像L(G(I))输入至第二判别器，第二判别器可以以原始卡通图像C_GT作为参考，输出一个用于表征卡通图像L(G(I))与原始卡通图像C_GT之间的线条强度差异的概率(即第二概率)，从而得到边缘线条增强损失值。In addition, in each iteration of training, the G(I) output in the first stage can also be input into the second stage line drawing network, and the second generator can draw lines for it, thereby generating cartoon images with lines L(G(I)). The cartoon image L(G(I)) generated by the second generator is input to the second discriminator, and the second discriminator can use the original cartoon image C _GT as a reference, and output a cartoon image L(G(I) for characterizing the cartoon image L(G(I) ) ) and the original cartoon image C _GT of the line intensity difference probability (ie the second probability), so as to obtain the edge line enhancement loss value.

然后，根据本次训练中的结构重建损失值、双边滤波平滑损失值、风格增强损失值、边缘线条分配损失值、以及边缘线条增强损失值，确定目标损失值。从而根据目标损失值，对第一生成器、第一判别器、第二生成器和第二判别器的参数进行调整，再进入到下一次迭代训练中，如此迭代，直至停止训练，得到训练完毕的两阶段神经网络。Then, the target loss value is determined according to the structure reconstruction loss value, the bilateral filter smoothing loss value, the style enhancement loss value, the edge line assignment loss value, and the edge line enhancement loss value in this training. Therefore, according to the target loss value, the parameters of the first generator, the first discriminator, the second generator and the second discriminator are adjusted, and then enter the next iterative training, and so on, until the training is stopped, and the training is completed. A two-stage neural network.

在两阶段神经网络训练完毕后，使用其对真实场景图像进行卡通化处理时，则仅用第一阶段抽象网络和第二阶段线条描绘网络中的生成器进行处理，判别器仅用于训练过程中。After the two-stage neural network is trained, when using it to cartoonize the real scene images, only the generator in the first-stage abstract network and the second-stage line drawing network are used for processing, and the discriminator is only used in the training process. middle.

现简要描述两阶段神经网络的使用过程中的输入输出流程。将真实场景图像I(可以理解，在网络使用时，I即表示待卡通化处理的真实场景图像)输入至第一生成器G中，生成抽象卡通图像，并由第一生成器G对其进行风格化处理，生成具有艺术风格的风格抽象图像(可以理解，在网络使用时，G(I)即为第一生成器G输出的具有艺术风格的抽象图像——风格抽象图像)，然后，将风格抽象图像输入至第二生成器L中，输出最终的卡通图像(在网络使用时，L(G(I))即为第二生成器输出的最终的卡通图像)。The input and output flow in the use of the two-stage neural network will now be briefly described. Input the real scene image I (it can be understood that when the network is used, I represents the real scene image to be cartoonized) into the first generator G to generate an abstract cartoon image, which is processed by the first generator G. Stylized processing to generate a style abstract image with artistic style (it can be understood that when the network is used, G(I) is the abstract image with artistic style output by the first generator G—the style abstract image), and then, the The style abstract image is input into the second generator L, and the final cartoon image is output (in the case of network use, L(G(I)) is the final cartoon image output by the second generator).

在一个实施例中，迭代训练包括结构重建迭代训练和网络层叠迭代训练。In one embodiment, the iterative training includes structural reconstruction iterative training and network stacking iterative training.

可以理解，可以以端到端的方式迭代训练整个两阶段神经网络，即同时迭代训练第一阶段抽象网络和第二阶段线条描绘网络。除此之外，也可以将迭代训练拆分为结构重建迭代训练和网络层叠迭代训练，从而能够加快训练过程，提高收敛性。It can be understood that the entire two-stage neural network can be iteratively trained in an end-to-end manner, ie, the first-stage abstraction network and the second-stage line drawing network can be iteratively trained simultaneously. In addition, iterative training can also be split into structure reconstruction iterative training and network stacking iterative training, which can speed up the training process and improve convergence.

其中，结构重建迭代训练，用于训练进行图像重构处理的初始化抽象网络。网络层叠迭代训练，是指将初始化抽象网络和第二阶段线条描绘网络，层叠在一起进行迭代训练。即，先通过结构重建训练对第一阶段抽象网络进行初始化训练，得到初始化抽象网络。该初始化抽象网络仅具备图像重构功能，无法实现抽象平滑处理，所以，可以将初始化抽象网络和第二阶段线条描绘网络，层叠在一起进行迭代训练，从而生成能够实现图像重构和抽象平滑处理的第一阶段抽象网络和能够生成线条的第二阶段线条描绘网络。Among them, the structure reconstruction iterative training is used to train the initialization abstract network for image reconstruction processing. Network layered iterative training refers to layering the initial abstract network and the second-stage line drawing network for iterative training. That is, the first-stage abstract network is initialized and trained through structural reconstruction training to obtain the initialized abstract network. The initialized abstract network only has the function of image reconstruction and cannot achieve abstract smoothing. Therefore, the initialized abstract network and the second-stage line drawing network can be layered together for iterative training, so as to generate images that can achieve image reconstruction and abstract smoothing. A first-stage abstraction network and a second-stage line-drawing network capable of generating lines.

本实施例中，样本数据集包括真实场景样本图的第一集合、抽象卡通样本图的第二集合、以及与所述抽象卡通样本图对应的原始卡通图像的第三集合。深度神经网络的训练步骤，包括：将第一集合输入待训练的抽象网络中进行结构重建迭代训练，在每轮结构重建迭代训练中确定结构重建损失值，并根据结构重建损失值，调整第一生成器的参数，直至达到初始化训练停止条件，得到初始化抽象网络；将第一集合和第二集合输入至初始化抽象网络、以及将第二集合和第三集合输入至待训练的第二阶段线条描绘网络进行网络层叠迭代训练，并在每轮迭代中确定结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值；根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值，确定综合损失值；根据综合损失值，调整初始化抽象网络的第一生成器和第一判别器的参数、以及调整待训练的第二阶段线条描绘网络的第二生成器和第二判别器的参数，直至迭代停止，得到训练完毕的两阶段神经网络；两阶段神经网络中包括第一阶段抽象网络和第二阶段线条描绘网络。In this embodiment, the sample data set includes a first set of real scene sample images, a second set of abstract cartoon sample images, and a third set of original cartoon images corresponding to the abstract cartoon sample images. The training steps of the deep neural network include: inputting the first set into the abstract network to be trained for structural reconstruction iterative training, determining the structural reconstruction loss value in each round of structural reconstruction iterative training, and adjusting the first set according to the structural reconstruction loss value. The parameters of the generator until the initial training stop condition is reached, and the initialized abstract network is obtained; the first set and the second set are input to the initialization abstract network, and the second set and the third set are input to the second stage to be trained. Line drawing The network performs network cascade iterative training, and in each iteration, determines the loss value of structure reconstruction, the loss value of bilateral filter smoothing, the loss value of style enhancement, and the loss value of edge line distribution; according to the loss value of structure reconstruction, the loss value of bilateral filter smoothing , the style enhancement loss value, and the edge line distribution loss value to determine the comprehensive loss value; according to the comprehensive loss value, adjust the parameters of the first generator and the first discriminator to initialize the abstract network, and adjust the line drawing of the second stage to be trained The parameters of the second generator and the second discriminator of the network are used until the iteration stops, and the trained two-stage neural network is obtained; the two-stage neural network includes the first-stage abstract network and the second-stage line drawing network.

具体地，首先，计算机设备可以将预先训练用于实现图像重构的初始化抽象网络。即，计算机设备可以将第一集合输入待训练的抽象网络中进行结构重建迭代训练，在每轮结构重建迭代训练中确定结构重建损失值，并根据结构重建损失值，调整第一生成器的参数，直至达到初始化训练停止条件，得到初始化抽象网络。这样一来，得到的初始化抽象网络可以用于图像重构，生成基础的抽象图像。在一个实施例中，结构重建迭代训练的迭代次数在较少的情况下(比如，10次迭代)，就可以训练出初始化抽象网络。Specifically, first, the computer device may pre-train an initialization abstraction network for image reconstruction. That is, the computer device can input the first set into the abstract network to be trained for structural reconstruction iterative training, determine the structural reconstruction loss value in each round of structural reconstruction iterative training, and adjust the parameters of the first generator according to the structural reconstruction loss value , until the initial training stop condition is reached, and the initialized abstract network is obtained. In this way, the resulting initialized abstract network can be used for image reconstruction to generate basic abstract images. In one embodiment, when the number of iterations of the structural reconstruction iterative training is relatively small (for example, 10 iterations), the initialization abstract network can be trained.

然后，计算机设备可以将第一集合和第二集合输入至初始化抽象网络、以及将第二集合和第三集合输入至待训练的第二阶段线条描绘网络进行网络层叠迭代训练，并在每轮迭代中确定综合损失值，并根据综合损失值，调整初始化抽象网络的第一生成器和第一判别器的参数、以及调整待训练的第二阶段线条描绘网络的第二生成器和第二判别器的参数，直至迭代停止，得到训练完毕的两阶段神经网络；两阶段神经网络中包括第一阶段抽象网络和第二阶段线条描绘网络；其中，综合损失值，是由网络层叠迭代训练的每轮迭代中确定的结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值确定得到；综合损失值，是网络层叠迭代训练中的每轮迭代所确定的目标损失值。相当于，在网络层叠迭代训练的每轮迭代中，会确定结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值，并根据结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值，确定每轮迭代中的目标损失值，以对初始化抽象网络和待训练的第二阶段线条描绘网络的模型参数进行调整，从而在网络层叠迭代训练中也进行结构重建、双边滤波平滑、风格增强以及边缘线条分配这些方面的训练。Then, the computer device can input the first set and the second set to the initialization abstract network, and input the second set and the third set to the second-stage line drawing network to be trained for network stacking iterative training, and at each iteration Determine the comprehensive loss value in , and according to the comprehensive loss value, adjust the parameters of the first generator and the first discriminator to initialize the abstract network, and adjust the second generator and the second discriminator of the line drawing network in the second stage to be trained parameters until the iteration stops, and the trained two-stage neural network is obtained; the two-stage neural network includes the first-stage abstract network and the second-stage line drawing network; among them, the comprehensive loss value is calculated by the network layered and iteratively trained for each round of training. The structure reconstruction loss value, the bilateral filter smoothing loss value, the style enhancement loss value, and the edge line distribution loss value determined in the iteration are determined; the comprehensive loss value is the target loss value determined in each iteration of the network stacking iterative training. Equivalently, in each iteration of the network stacking iterative training, the loss value of structure reconstruction, bilateral filter smoothing loss value, style enhancement loss value, and edge line distribution loss value are determined, and according to the structure reconstruction loss value, bilateral filter smoothing loss value value, style enhancement loss value, and edge line assignment loss value, determine the target loss value in each iteration, to adjust the model parameters of the initial abstract network and the line drawing network of the second stage to be trained, so that the network stacks iteratively. Training is also performed on structural reconstruction, bilateral filter smoothing, style enhancement, and edge line assignment.

可以理解，边缘线条分配损失值，是以第二集合中的抽象卡通样本图作为输入，以其对应的原始卡通图像作为输出参考，进行有监督地训练得到得损失值，从而有监督地训练第二阶段线条描绘网络。It can be understood that the loss value of the edge line distribution is based on the abstract cartoon sample image in the second set as the input, and the corresponding original cartoon image is used as the output reference to perform supervised training to obtain the loss value, so as to supervised training of the first. Two-stage lines delineate the network.

在一个实施例中，综合损失值还可以包括边缘线条增强损失值。可以理解，边缘线条增强损失值，是通过无监督地训练，来使第二阶段线条描绘网络生成增强、清晰的轮廓边缘线条。相当于，在网络层叠迭代训练中进行结构重建、双边滤波平滑、风格增强、边缘线条分配、以及边缘线条增强这些方面的训练。In one embodiment, the composite loss value may also include an edge line enhancement loss value. It can be understood that the loss value of edge line enhancement is to enable the second-stage line drawing network to generate enhanced and clear outline edge lines through unsupervised training. Equivalently, the training of structure reconstruction, bilateral filter smoothing, style enhancement, edge line assignment, and edge line enhancement is performed in the network cascade iterative training.

在一个实施例中，方法还包括：在网络层叠迭代训练过程中，当初始化抽象网络和待训练的第二阶段线条描绘网络中的其中一个网络达到训练停止条件、且另一个网络未达到训练停止条件时，则停止调整达到训练停止条件的网络的网络模型参数，并继续对未达到训练停止条件的另一网络进行训练，直至达到训练停止条件。In one embodiment, the method further includes: in the iterative training process of the network stacking, when one of the initialized abstract network and the second-stage line delineation network to be trained reaches the training stop condition, and the other network does not reach the training stop When the training stop condition is reached, the network model parameters of the network that has reached the training stop condition are stopped, and the training of another network that has not reached the training stop condition is continued until the training stop condition is reached.

比如，大约100个回合训练，就可以生成满意的第一阶段抽象网络，则可以停止第一阶段抽象网络的训练。对于第二阶段线条描绘网络，需要经过200个回合的训练才能达到效果。所以，在停止对第一阶段抽象网络的训练之后，可以继续对第二阶段线条描绘网络进行训练，直至达到训练停止条件。For example, after about 100 rounds of training, a satisfactory first-stage abstract network can be generated, and the training of the first-stage abstract network can be stopped. For the second-stage line drawing network, it takes 200 epochs of training to achieve the effect. Therefore, after stopping the training of the first-stage abstract network, you can continue to train the second-stage line drawing network until the training stop condition is reached.

上述实施例中，先训练一个用于实现图像重构的初始化抽象网络，然后将初始化抽象网络和第二阶段线条描绘网络，层叠在一起进行迭代训练，从而能够更加快速地收敛，得到能够实现图像重构和抽象平滑处理的第一阶段抽象网络和能够生成线条的第二阶段线条描绘网络。In the above embodiment, an initialization abstract network for realizing image reconstruction is trained first, and then the initialization abstract network and the second-stage line drawing network are layered together for iterative training, so that the convergence can be performed more quickly, and an image that can realize the image can be obtained. A first-stage abstraction network for reconstruction and abstraction smoothing and a second-stage line drawing network capable of generating lines.

应该理解的是，虽然上述流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，上述流程图中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the above flow charts are displayed in sequence according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in the above flow chart may include multiple steps or multiple stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution sequence of these steps or stages It is also not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of a step or phase within the other steps.

为了证明本申请的方法能够产生的效果，现结合图4至图7进行说明。In order to prove the effect that the method of the present application can produce, the description will now be made with reference to FIG. 4 to FIG. 7 .

参见图4，图4中402为输入的真实场景图像，404～408是由输入的真实场景图像生成的不同风格的卡通图像。可以理解，分别用不同风格的样本数据集(即不同风格的抽象卡通样本图或原始卡通图像)所训练得到的深度学习网络，可以用来生成不同风格的卡通图像。比如，用于生成404这种A艺术家的风格的卡通图像的深度学习网络，在训练中则是将属于A艺术家的风格的抽象卡通样本图或原始卡通图像，作为样本数据集。又比如，用于生成408这种C艺术家的风格的卡通图像的深度学习网络，在训练中则是将C艺术家的风格的抽象卡通样本图或原始卡通图像，作为样本数据集。Referring to FIG. 4 , 402 in FIG. 4 is an input real scene image, and 404 to 408 are cartoon images of different styles generated from the input real scene image. It can be understood that deep learning networks trained with different styles of sample datasets (ie, abstract cartoon sample images of different styles or original cartoon images) can be used to generate cartoon images of different styles. For example, the deep learning network used to generate 404 cartoon images in artist A's style uses abstract cartoon sample images or original cartoon images belonging to artist A's style as the sample data set during training. For another example, the deep learning network used to generate cartoon images in the style of C artist 408 uses abstract cartoon sample images or original cartoon images in the style of C artist as the sample data set during training.

图5用于展示不同状态下的结果。参照图5可知，(a)即为输入的真实场景图像，(b)～(e)即为中间结果。(f)为最终结果，即最终卡通化的卡通图像。(f)相较于(b)～(e)而言，卡通图像的质量更高，且艺术风格更加明显。Figure 5 is used to show the results in different states. Referring to FIG. 5 , (a) is the input real scene image, and (b) to (e) are the intermediate results. (f) is the final result, that is, the final cartoonized cartoon image. (f) Compared with (b) to (e), the cartoon images are of higher quality and more obvious artistic style.

图6为一个实施例中本案与参照组的比对实验效果图。图6中的(a)即为输入的真实场景图像，(b)为CycleGAN(CycleGAN本质上是两个镜像对称的GAN，构成了一个环形网络，能实现循环风格转换)的卡通化效果；(c)CartoonGAN(CartoonGAN，是CVPR(IEEE国际计算机视觉与模式识别会议)2018中的基于一个GAN模型将真实照片转换为卡通图像的神经网络模型)的卡通化效果；(d)CartoonGAN2(CartoonGAN2为微调(finetuned)的CartoonGAN训练结果)的卡通化效果；(e)本方案的卡通化效果。且每组效果图中都是三种不同风格的卡通图像。从上到下依次为A艺术家的风格、B艺术家的风格和C艺术家的风格。从图6可知，本方案的卡通化效果(e)，能够更加准确保留显著性结构。比如，相较于(b)和(d)，云彩和石头以及人物背景这些显著性的结构都得到较好保持。而且相较于(b)和(d)色彩抽象更加的平滑，且线条更加锐利，比如，(b)和(d)中大桥的线条就很不清晰。所以，本方案的卡通化效果比(b)和(d)的卡通图像质量更高。FIG. 6 is a comparison experiment effect diagram of the present case and the reference group in one embodiment. (a) in Figure 6 is the input real scene image, (b) is the cartoon effect of CycleGAN (CycleGAN is essentially two mirror-symmetrical GANs, forming a ring network, which can realize cyclic style conversion); ( c) The cartoon effect of CartoonGAN (CartoonGAN, which is a neural network model based on a GAN model to convert real photos into cartoon images) in CVPR (IEEE International Conference on Computer Vision and Pattern Recognition) 2018; (d) CartoonGAN2 (CartoonGAN2 is a fine-tuning (finetuned CartoonGAN training result) cartoon effect; (e) cartoon effect of this scheme. And each group of renderings are three different styles of cartoon images. From top to bottom are the style of artist A, the style of artist B, and the style of artist C. It can be seen from Figure 6 that the cartoon effect (e) of this scheme can more accurately retain the saliency structure. For example, compared to (b) and (d), the salient structures of clouds and stones, as well as character backgrounds are better preserved. Moreover, compared with (b) and (d), the color abstraction is smoother, and the lines are sharper. For example, the lines of the bridge in (b) and (d) are very unclear. Therefore, the cartoon effect of this scheme is higher than that of (b) and (d) cartoon images.

本方案的卡通化效果，还避免了在需要平滑的区域催生线条。比如，相较于(c)云彩和海浪中就避免了催生杂乱线条，从而使得画面更加平滑，此外还避免了(c)中色彩溢出和结构扭曲、大片伪影的问题。所以，本方案的卡通化效果比(c)的卡通图像质量更高。The cartoon effect of this scheme also avoids the generation of lines in areas that need to be smoothed. For example, compared with (c) clouds and ocean waves, it avoids the generation of cluttered lines, which makes the picture smoother, and also avoids the problems of color overflow, structural distortion, and large-scale artifacts in (c). Therefore, the cartoon effect of this solution is higher than the cartoon image quality of (c).

图7为采用本申请的方案实现不同风格的卡通图像的对比图。从左到右依次为输入的真实场景图，A艺术家的风格、B艺术家的风格和C艺术家的风格。可以理解，根据本案的方法采用不同风格的训练数据，能够生成不同艺术风格的卡通图像。FIG. 7 is a comparison diagram of realizing cartoon images of different styles by adopting the solution of the present application. From left to right are the input real scene graphs, artist A's style, artist B's style, and artist C's style. It can be understood that the method according to this case can generate cartoon images of different artistic styles by using different styles of training data.

在一个实施例中，如图8所示，提供了一种真实场景图像卡通化的处理装置，该装置可以采用软件模块或硬件模块，或者是二者的结合成为计算机设备的一部分，该装置具体包括：获取模块802、抽象处理模块804以及线条生成模块806，其中：In one embodiment, as shown in FIG. 8 , a processing apparatus for cartoonizing a real scene image is provided. The apparatus can use software modules or hardware modules, or a combination of the two to become a part of computer equipment. Including: acquisition module 802, abstract processing module 804 and line generation module 806, wherein:

获取模块802，用于获取真实场景图像。The acquiring module 802 is used for acquiring real scene images.

抽象处理模块804，用于基于真实场景图像的语义信息，对真实场景图像进行图像重构处理和抽象平滑处理，得到真实场景图像映射在卡通域上的抽象卡通图像；抽象卡通图像，具有真实场景图像中的显著性结构、且缺失真实场景图像中的轮廓边缘线条；对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像。The abstract processing module 804 is used to perform image reconstruction processing and abstract smoothing processing on the real scene image based on the semantic information of the real scene image, so as to obtain an abstract cartoon image in which the real scene image is mapped on the cartoon domain; the abstract cartoon image has a real scene The saliency structure in the image and the lack of contour edge lines in the real scene image; stylize the abstract cartoon image to generate a style cartoon image with artistic style.

线条生成模块806，用于生成风格卡通图像的轮廓边缘线条，得到真实场景图像卡通化后的卡通图像。The line generation module 806 is used for generating outline and edge lines of the style cartoon image, so as to obtain a cartoon image after cartoonization of the real scene image.

该装置还包括：The device also includes:

输出模块808，用于当视频帧序列为视频文件中的图像帧序列时，则根据视频帧序列中每张真实场景图像卡通化后的卡通图像，生成卡通化视频文件，或，当视频文件被播放时，实时获取与视频帧序列中各真实场景图像对应的卡通图像并输出。The output module 808 is configured to, when the video frame sequence is an image frame sequence in a video file, generate a cartoonized video file according to the cartoon image of each real scene image in the video frame sequence after cartoonization, or, when the video file is During playback, the cartoon images corresponding to the real scene images in the video frame sequence are acquired in real time and output.

在一个实施例中，输出模块808还用于当视频帧序列为实时视频流中的图像帧序列时，则实时根据视频帧序列中真实场景图像卡通化后的卡通图像，输出卡通化视频流。In one embodiment, the output module 808 is further configured to output the cartoonized video stream in real time according to the cartoon image of the real scene image cartoonized in the video frame sequence when the video frame sequence is an image frame sequence in the real-time video stream.

在一个实施例中，抽象处理模块804还用于将真实场景图像输入已训练的深度神经网络中，以在第一阶段中，对真实场景图像进行语义提取，基于提取的语义信息，对真实场景图像进行图像重构处理，并对重构后的图像内容进行双边滤波平滑处理，生成抽象卡通图像。In one embodiment, the abstract processing module 804 is further configured to input the real scene image into the trained deep neural network, so that in the first stage, semantic extraction is performed on the real scene image, and based on the extracted semantic information, the real scene image is extracted. The image is subjected to image reconstruction processing, and the reconstructed image content is subjected to bilateral filtering and smoothing processing to generate an abstract cartoon image.

在一个实施例中，真实场景图像被输入至深度神经网络中的抽象网络的第一生成器中。In one embodiment, the real scene image is input into the first generator of the abstract network in the deep neural network.

如图9所示，该装置还包括：模型训练模块801及输出模块808。其中：As shown in FIG. 9 , the apparatus further includes: a model training module 801 and an output module 808 . in:

模型训练模块801，用于获取样本数据集；样本数据集中包括真实场景样本图的第一集合；将样本数据集输入待训练的深度神经网络中进行迭代训练，在每轮迭代中确定结构重建损失值和双边滤波平滑损失值，并根据所述结构重建损失值和双边滤波平滑损失值，确定目标损失值；在每轮迭代中，根据目标损失值，调整网络模型参数，直至停止迭代，得到训练完毕的深度神经网络；网络模型参数包括第一生成器的参数；训练完毕的深度神经网络中，包括训练完毕的抽象网络；其中，结构重建损失值，用于表示第一生成器根据真实场景样本图生成的卡通图像，与真实场景样本图之间的结构特征差异；双边滤波平滑损失值，用于根据像素值相似性和空间位置相似性，确定第一生成器生成的卡通图像中相邻像素之间的差异。The model training module 801 is used to obtain a sample data set; the sample data set includes a first set of real scene sample graphs; the sample data set is input into the deep neural network to be trained for iterative training, and the structural reconstruction loss is determined in each iteration value and bilateral filtering smoothing loss value, and reconstruct the loss value and bilateral filtering smoothing loss value according to the structure to determine the target loss value; in each round of iteration, adjust the network model parameters according to the target loss value until the iteration is stopped and the training is obtained The completed deep neural network; the network model parameters include the parameters of the first generator; the trained deep neural network includes the trained abstract network; wherein, the structural reconstruction loss value is used to indicate that the first generator is based on real scene samples The difference in structural features between the cartoon image generated by the graph and the sample graph of the real scene; the bilateral filtering smoothes the loss value, which is used to determine the adjacent pixels in the cartoon image generated by the first generator according to the similarity of pixel values and the similarity of spatial positions difference between.

在一个实施例中，抽象处理模块804还用于通过第一生成器对抽象卡通图像进行风格化处理，生成具有艺术风格的风格卡通图像。In one embodiment, the abstract processing module 804 is further configured to perform stylization processing on the abstract cartoon image through the first generator to generate a style cartoon image with artistic style.

在一个实施例中，抽象网络还包括第一判别器；网络模型参数还包括第一判别器的参数；样本数据集还包括抽象卡通样本图的第二集合；模型训练模块801还用于在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值和风格增强损失值，并根据所述结构重建损失值、双边滤波平滑损失值和风格增强损失值，确定目标损失值；其中，风格增强损失值，是由第一判别器根据第一生成器生成的卡通图像输出的第一概率确定；第一概率，用于表示第一生成器生成的卡通图像，属于抽象卡通样本图的风格的概率。In one embodiment, the abstract network further includes a first discriminator; the network model parameters further include parameters of the first discriminator; the sample data set further includes a second set of abstract cartoon sample images; the model training module 801 is further configured to In the round of iterations, the structure reconstruction loss value, the bilateral filter smoothing loss value and the style enhancement loss value are determined, and the target loss value is determined according to the structure reconstruction loss value, the bilateral filter smoothing loss value and the style enhancement loss value; The loss value is determined by the first discriminator according to the first probability output by the cartoon image generated by the first generator; the first probability is used to represent the probability that the cartoon image generated by the first generator belongs to the style of the abstract cartoon sample image .

在一个实施例中，模型训练模块801还用于获取原始卡通图像的第三集合；第三集合中的原始卡通图像具有同一种艺术风格；根据预训练的线追踪网络，提取各原始卡通图像的轮廓边缘线条；根据提取的轮廓边缘线条和原始卡通图像，生成抽象卡通样本图，得到第二集合；抽象卡通样本图，是从原始卡通图像中消除轮廓边缘线条后的抽象图像。In one embodiment, the model training module 801 is further configured to obtain a third set of original cartoon images; the original cartoon images in the third set have the same artistic style; according to the pre-trained line tracing network, extract the Contour edge lines; according to the extracted contour edge lines and the original cartoon image, an abstract cartoon sample image is generated, and the second set is obtained; the abstract cartoon sample image is an abstract image after the contour edge lines are eliminated from the original cartoon image.

在一个实施例中，深度学习网络为两阶段神经网络，抽象网络为两阶段神经网络中的第一阶段抽象网络；两阶段神经网络中还包括第二阶段线条描绘网络；线条生成模块806还用于在第二阶段中，将在第一阶段生成的风格卡通图像，输入至已训练的两阶段神经网络中第二阶段线条描绘网络中的第二生成器，以通过第二生成器对风格卡通图像进行轮廓边缘线条生成处理，得到真实场景图像卡通化后的卡通图像；其中，第二阶段线条描绘网络，是在第二阶段中，生成轮廓边缘线条的深度神经网络。In one embodiment, the deep learning network is a two-stage neural network, and the abstract network is a first-stage abstract network in the two-stage neural network; the two-stage neural network further includes a second-stage line drawing network; the line generation module 806 also uses In the second stage, the style cartoon image generated in the first stage is input to the second generator in the second stage line drawing network in the trained two-stage neural network, so that the style cartoon image is analyzed by the second generator. The image is subjected to contour edge line generation processing to obtain a cartoon image of the real scene image; wherein, the second stage line drawing network is a deep neural network that generates contour edge lines in the second stage.

在一个实施例中，抽象卡通样本图，是从原始卡通图像中消除轮廓边缘线条的抽象图像；样本数据集还包括与抽象卡通样本图对应的原始卡通图像的第三集合；所述网络模型参数还包括所述第二生成器的参数。模型训练模块801还用于在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值、风格增强损失值和边缘线条分配损失值，并根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值和边缘线条分配损失值，确定目标损失值.其中，边缘线条分配损失值，用于表示以抽象卡通样本图作为第二生成器的输入时所生成的具有线条的图像，与抽象卡通样本图所对应的原始卡通图像之间的差异；网络模型参数还包括第二生成器的参数。In one embodiment, the abstract cartoon sample image is an abstract image that eliminates contour edge lines from the original cartoon image; the sample data set further includes a third set of original cartoon images corresponding to the abstract cartoon sample image; the network model parameters Also includes parameters for the second generator. The model training module 801 is further configured to determine the structural reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value in each iteration, and according to the structural reconstruction loss value, the bilateral filtering smoothing loss value , the style enhancement loss value and the edge line distribution loss value to determine the target loss value. Among them, the edge line distribution loss value is used to represent the image with lines generated when the abstract cartoon sample image is used as the input of the second generator, and The difference between the original cartoon images corresponding to the abstract cartoon sample images; the network model parameters also include the parameters of the second generator.

在一个实施例中，第二阶段线条描绘网络还包括第二判别器；网络模型参数还包括第二判别器的参数；模型训练模块801还用于在每轮迭代中，确定结构重建损失值、双边滤波平滑损失值、风格增强损失值、边缘线条分配损失值和边缘线条增强损失值；根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值、边缘线条分配损失值和所述边缘线条增强损失值，确定目标损失值。其中，边缘线条增强损失值，是将第二生成器以第一生成器所生成的图像作为输入时所生成的具有线条的图像，输入至第二判别器后，由第二判别器输出的第二概率确定；第二概率，用于表征第二生成器生成的具有线条的图像与原始卡通图像之间的线条强度差异。In one embodiment, the second-stage line drawing network further includes a second discriminator; the network model parameters further include parameters of the second discriminator; the model training module 801 is further configured to determine the structural reconstruction loss value, bilateral filter smoothing loss value, style enhancement loss value, edge line assignment loss value and edge line enhancement loss value; according to the structure reconstruction loss value, bilateral filter smoothing loss value, style enhancement loss value, edge line assignment loss value and the The edge line enhances the loss value to determine the target loss value. Among them, the loss value of edge line enhancement is the image with lines generated when the second generator takes the image generated by the first generator as input, and after inputting it to the second discriminator, the second discriminator outputs the first image. The second probability is determined; the second probability is used to characterize the line intensity difference between the image with lines generated by the second generator and the original cartoon image.

在一个实施例中，迭代训练包括结构重建迭代训练和网络层叠迭代训练；结构重建迭代训练，用于训练进行图像重构处理的初始化抽象网络；网络层叠迭代训练，是指将初始化抽象网络和第二阶段线条描绘网络，层叠在一起进行迭代训练。In one embodiment, the iterative training includes structural reconstruction iterative training and network stacking iterative training; the structure reconstruction iterative training is used to train the initialization abstract network for image reconstruction processing; the network stacking iterative training refers to combining the initialization abstract network and the first Two-stage line delineation networks, layered together for iterative training.

本实施例中，模型训练模块801还用于获取真实场景样本图的第一集合、抽象卡通样本图的第二集合、以及与所述抽象卡通样本图对应的原始卡通图像的第三集合；将第一集合输入待训练的抽象网络中进行结构重建迭代训练，在每轮结构重建迭代训练中确定结构重建损失值，并根据结构重建损失值，调整第一生成器的参数，直至达到初始化训练停止条件，得到初始化抽象网络；将第一集合和第二集合输入至初始化抽象网络、以及将第二集合和第三集合输入至待训练的第二阶段线条描绘网络进行网络层叠迭代训练，并在每轮迭代中确定结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值；根据所述结构重建损失值、双边滤波平滑损失值、风格增强损失值、以及边缘线条分配损失值，确定综合损失值，并根据综合损失值，调整初始化抽象网络的第一生成器和第一判别器的参数、以及调整待训练的第二阶段线条描绘网络的第二生成器和第二判别器的参数，直至迭代停止，得到训练完毕的两阶段神经网络；两阶段神经网络中包括第一阶段抽象网络和第二阶段线条描绘网络；其中，综合损失值，是网络层叠迭代训练的每轮迭代中所确定的目标损失值。In this embodiment, the model training module 801 is further configured to obtain a first set of real scene sample images, a second set of abstract cartoon sample images, and a third set of original cartoon images corresponding to the abstract cartoon sample images; The first set is input to the abstract network to be trained for structural reconstruction iterative training, the structural reconstruction loss value is determined in each round of structural reconstruction iterative training, and the parameters of the first generator are adjusted according to the structural reconstruction loss value until the initialization training stops. condition, to obtain the initialized abstract network; input the first set and the second set to the initialized abstract network, and input the second set and the third set to the second-stage line drawing network to be trained for network cascade iterative training, and at each Determine the structure reconstruction loss value, the bilateral filter smoothing loss value, the style enhancement loss value, and the edge line assignment loss value in the iterations; according to the structure reconstruction loss value, the bilateral filter smoothing loss value, the style enhancement loss value, and the edge line assignment Loss value, determine the comprehensive loss value, and according to the comprehensive loss value, adjust the parameters of the first generator and the first discriminator to initialize the abstract network, and adjust the second generator and second generator of the line drawing network in the second stage to be trained. The parameters of the discriminator until the iteration stops, and the trained two-stage neural network is obtained; the two-stage neural network includes the first-stage abstract network and the second-stage line drawing network; among them, the comprehensive loss value is the network layered iteration training each time. The target loss value determined in the round iteration.

关于真实场景图像卡通化的处理装置的具体限定可以参见上文中对于真实场景图像卡通化的处理方法的限定，在此不再赘述。上述真实场景图像卡通化的处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the processing device for the cartoonization of the real scene image, please refer to the limitation of the processing method for the cartoonization of the real scene image above, which will not be repeated here. Each module in the above-mentioned real scene image cartoonization processing device may be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是终端，其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信，无线方式可通过WIFI、运营商网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种真实场景图像卡通化的处理方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏，该计算机设备的输入装置可以是显示屏上覆盖的触摸层，也可以是计算机设备外壳上设置的按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided, and the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 10 . The computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies. When the computer program is executed by the processor, a processing method for cartoonizing a real scene image is realized. The display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是服务器，其内部结构图可以如图11所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储搜索数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种真实场景图像卡通化的处理方法。In one embodiment, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 11 . The computer device includes a processor, memory, and a network interface connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The computer device's database is used to store search data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, a processing method for cartoonizing a real scene image is realized.

本领域技术人员可以理解，图10和11中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structures shown in FIGS. 10 and 11 are only block diagrams of partial structures related to the solution of the present application, and do not constitute a limitation on the computer equipment to which the solution of the present application is applied. A device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

在一个实施例中，还提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现上述各方法实施例中的步骤。In one embodiment, a computer device is also provided, including a memory and a processor, where a computer program is stored in the memory, and the processor implements the steps in the foregoing method embodiments when the processor executes the computer program.

在一个实施例中，提供了一种计算机可读存储介质，存储有计算机程序，该计算机程序被处理器执行时实现上述各方法实施例中的步骤。In one embodiment, a computer-readable storage medium is provided, which stores a computer program, and when the computer program is executed by a processor, implements the steps in the foregoing method embodiments.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory，ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory，RAM)或外部高速缓冲存储器。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器(Static Random Access Memory，SRAM)或动态随机存取存储器(Dynamic Random Access Memory，DRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the various embodiments provided in this application may include at least one of non-volatile and volatile memory. The non-volatile memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash memory or optical memory, and the like. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, the RAM may be in various forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM).

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

1. A method for processing cartoon of real scene image is characterized by comprising the following steps:

acquiring a real scene image;

based on the semantic information of the real scene image, carrying out image reconstruction processing and abstract smoothing processing on the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain; the abstract cartoon image has a significant structure in the real scene image and is missing a contour edge line in the real scene image;

stylizing the abstract cartoon image to generate a style cartoon image with artistic style;

and generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized.

2. The method according to claim 1, wherein the real scene image is extracted from a sequence of video frames in sequence; the video frame sequence comprises at least two real scene images which are arranged in sequence;

the method further comprises the following steps:

when the sequence of video frames is a sequence of image frames in a video file, then

Generating cartoon video files according to cartoon images cartoon after cartoon of each real scene image in the video frame sequence,

or when the video file is played, the cartoon images corresponding to the real scene images in the video frame sequence are acquired in real time and output.

3. The method of claim 2, further comprising:

when the sequence of video frames is a sequence of image frames in a real-time video stream, then

And outputting cartoon video stream according to the cartoon image after cartoon of the real scene image in the video frame sequence.

4. The method according to any one of claims 1 to 3, wherein the performing image reconstruction processing and abstract smoothing processing on the real scene image based on semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain comprises:

inputting the real scene image into a trained deep neural network to perform semantic extraction on the real scene image in a first stage;

based on the extracted semantic information, carrying out image reconstruction processing on the real scene image;

and carrying out bilateral filtering smoothing treatment on the reconstructed image content to generate an abstract cartoon image.

5. The method of claim 4, wherein the real scene image is input into a first generator of an abstract network in the deep neural network; the training step of the deep neural network comprises the following steps:

acquiring a sample data set; the sample data set comprises a first set of real scene sample graphs;

inputting the sample data set into a deep neural network to be trained for iterative training, determining a structural reconstruction loss value and a bilateral filtering smoothing loss value in each iteration, and determining a target loss value according to the structural reconstruction loss value and the bilateral filtering smoothing loss value;

in each iteration, adjusting network model parameters according to the target loss value until the iteration is stopped to obtain a trained deep neural network; the network model parameters include parameters of the first generator; the trained deep neural network comprises a trained abstract network;

the structure reconstruction loss value is used for representing the difference of the structural characteristics between the cartoon image generated by the first generator according to the real scene sample image and the real scene sample image;

and the bilateral filtering smoothing loss value is used for determining the difference between adjacent pixels in the cartoon image generated by the first generator according to the pixel value similarity and the spatial position similarity.

6. The method of claim 5, wherein said stylizing said abstract cartoon image to produce a stylized cartoon image having an artistic style comprises:

in the first stage, stylizing the abstract cartoon image through the first generator to generate a style cartoon image with artistic style.

7. The method of claim 6, wherein the abstract network further comprises a first arbiter; the network model parameters further comprise parameters of the first discriminator; the sample data set further comprises a second set of abstract cartoon sample diagrams;

determining a structural reconstruction loss value and a bilateral filtering smoothing loss value in each iteration, and determining a target loss value according to the structural reconstruction loss value and the bilateral filtering smoothing loss value comprises:

in each iteration, determining a structural reconstruction loss value, a bilateral filtering smoothing loss value and a grid enhancement loss value, and determining a target loss value according to the structural reconstruction loss value, the bilateral filtering smoothing loss value and the grid enhancement loss value;

wherein the style enhancement loss value is determined by the first discriminator based on a first probability of the cartoon image output generated by the first generator; the first probability is used for representing the probability that the cartoon image generated by the first generator belongs to the style of the abstract cartoon sample graph.

8. The method of claim 7, wherein the obtaining of the second set comprises:

acquiring a third set of original cartoon images; the original cartoon images in the third set have the same artistic style;

extracting contour edge lines of the original cartoon images according to a pre-trained line tracking network;

generating an abstract cartoon sample picture according to the extracted contour edge lines and the original cartoon image to obtain a second set; the abstract cartoon sample picture is an abstract picture obtained by eliminating contour edge lines from the original cartoon picture.

9. The method of claim 7, wherein the deep learning network is a two-stage neural network, and the abstraction network is a first-stage abstraction network of the two-stage neural network; the two-stage neural network also comprises a second-stage line drawing network; the generating of the contour edge line of the style cartoon image to obtain the cartoon image comprises:

in the second stage, inputting the style cartoon image generated in the first stage into a second generator in a second-stage line drawing network in the trained two-stage neural network, and performing contour edge line generation processing on the style cartoon image through the second generator to obtain a cartoon image after cartoon of the real scene image is cartoonized;

wherein the second-stage line tracing network is a deep neural network for generating contour edge lines in the second stage.

10. The method of claim 9, wherein the abstract cartoon sample map is an abstract image with contour edge lines removed from an original cartoon image; the sample data set further comprises a third set of the original cartoon images corresponding to the abstract cartoon sample map; the network model parameters further include parameters of the second generator;

in each iteration, determining a structure reconstruction loss value, a bilateral filtering smoothing loss value and a style enhancement loss value, and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value and the style enhancement loss value comprises:

determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value in each iteration, and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value;

and the edge line distribution loss value is used for representing the difference between the image with the line, which is generated when the abstract cartoon sample diagram is used as the input of the second generator, and the original cartoon image corresponding to the abstract cartoon sample diagram.

11. The method of claim 10, wherein the second stage line drawing network further comprises a second arbiter; the network model parameters further comprise parameters of the second discriminator;

in each iteration, determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value, and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value, including:

in each iteration, determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value, an edge line distribution loss value and an edge line enhancement loss value;

determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, the edge line distribution loss value and the edge line enhancement loss value;

the edge line enhancement loss value is determined by a second probability output by the second discriminator after the image with the line generated by the second generator with the image generated by the first generator as input is input to the second discriminator; and the second probability is used for representing the line intensity difference between the image with the lines generated by the second generator and the original cartoon image.

12. The method of claim 4, wherein the deep neural network training step comprises:

acquiring a first set of real scene sample images, a second set of abstract cartoon sample images and a third set of original cartoon images corresponding to the abstract cartoon sample images;

inputting the first set into an abstract network to be trained for structure reconstruction iterative training, determining a structure reconstruction loss value in each round of structure reconstruction iterative training, and adjusting parameters of the first generator according to the structure reconstruction loss value until an initialization training stop condition is reached to obtain an initialization abstract network;

inputting the first set and the second set into the initialized abstract network, inputting the second set and the third set into a second-stage line drawing network to be trained for network stacking iterative training, and determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value in each iteration;

determining a comprehensive loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value;

adjusting parameters of the first generator and the first discriminator of the initialized abstract network and parameters of the second generator and the second discriminator of the second-stage line drawing network to be trained according to the comprehensive loss value until iteration stops to obtain a trained two-stage neural network; the two-stage neural network comprises a first-stage abstract network and a second-stage line drawing network.

13. A device for processing cartoon of real scene image, the device comprising:

the acquisition module is used for acquiring a real scene image;

the abstract processing module is used for carrying out image reconstruction processing and abstract smoothing processing on the real scene image based on the semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain; the abstract cartoon image has a significant structure in the real scene image and is missing a contour edge line in the real scene image; stylizing the abstract cartoon image to generate a style cartoon image with artistic style;

and the line generating module is used for generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized.

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.