[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111696028A - Method and device for processing cartoon of real scene image, computer equipment and storage medium - Google Patents

Method and device for processing cartoon of real scene image, computer equipment and storage medium Download PDF

Info

Publication number
CN111696028A
CN111696028A CN202010440936.1A CN202010440936A CN111696028A CN 111696028 A CN111696028 A CN 111696028A CN 202010440936 A CN202010440936 A CN 202010440936A CN 111696028 A CN111696028 A CN 111696028A
Authority
CN
China
Prior art keywords
cartoon
image
loss value
abstract
real scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010440936.1A
Other languages
Chinese (zh)
Inventor
何盛烽
李思敏
孙子荀
刘婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Tencent Technology Shenzhen Co Ltd
Original Assignee
South China University of Technology SCUT
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Tencent Technology Shenzhen Co Ltd filed Critical South China University of Technology SCUT
Priority to CN202010440936.1A priority Critical patent/CN111696028A/en
Publication of CN111696028A publication Critical patent/CN111696028A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/181Segmentation; Edge detection involving edge growing; involving edge linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20028Bilateral filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The application relates to a method and a device for processing cartoon of real scene images based on artificial intelligence, computer equipment and a storage medium. The method comprises the following steps: acquiring a real scene image; performing image reconstruction processing and abstract smoothing processing on the real scene image based on semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain; the abstract cartoon image has a significant structure in the real scene image and lacks contour edge lines in the real scene image; stylizing the abstract cartoon image to generate a style cartoon image with artistic style; and generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized. The method can improve the quality of the generated cartoon image.

Description

Method and device for processing cartoon of real scene image, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technology and the field of image processing technology, and in particular, to a method and an apparatus for processing a cartoon image of a real scene, a computer device, and a storage medium.
Background
With the rapid development of scientific technology, various advanced image processing technologies are emerging continuously, and the application scenes of the image processing technologies are more and more extensive. For example, the real scene image belongs to one of the scenes by cartoonizing.
The traditional cartoon method is based on a manually customized operator or detection parameter to carry out cartoon conversion on a real picture, does not consider the information of the picture, only carries out image conversion in a fixed mode, and does not relate to related artistic processing, so that the quality of the generated cartoon image is poor. It is a considerable problem how to generate high-quality cartoon images by using other techniques (e.g., artificial intelligence techniques).
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a computer device, and a storage medium for processing an image cartoon of a real scene, which can improve image quality, in order to solve the above technical problems.
A method for processing cartoon of real scene images comprises the following steps:
acquiring a real scene image;
performing image reconstruction processing and abstract smoothing processing on the real scene image based on semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain; the abstract cartoon image has a significant structure in the real scene image and lacks contour edge lines in the real scene image;
stylizing the abstract cartoon image to generate a style cartoon image with artistic style;
and generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized.
A processing device for cartoon real scene images, comprising:
the acquisition module is used for acquiring a real scene image;
the abstract processing module is used for carrying out image reconstruction processing and abstract smoothing processing on the real scene image based on the semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on the cartoon domain; the abstract cartoon image has a significant structure in the real scene image and lacks contour edge lines in the real scene image; stylizing the abstract cartoon image to generate a style cartoon image with artistic style;
and the line generation module is used for generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized.
In one embodiment, the real scene image is extracted from the video frame sequence in sequence; the video frame sequence comprises at least two real scene images which are arranged in sequence;
the device still includes:
and the output module is used for generating a cartoon video file according to the cartoon image after cartoon of each real scene image in the video frame sequence when the video frame sequence is the image frame sequence in the video file, or acquiring and outputting the cartoon image corresponding to each real scene image in the video frame sequence in real time when the video file is played.
In one embodiment, the output module is further configured to output the cartoon video stream in real time according to the cartoon image after cartoon of the real scene image in the video frame sequence when the video frame sequence is the image frame sequence in the real-time video stream.
In one embodiment, the abstract processing module is further configured to input the real scene image into a trained deep neural network, so as to perform semantic extraction on the real scene image in the first stage, perform image reconstruction processing on the real scene image based on the extracted semantic information, and perform bilateral filtering smoothing processing on reconstructed image content, thereby generating the abstract cartoon image.
In one embodiment, a real scene image is input into a first generator of an abstract network in a deep neural network; the device still includes:
the model training module is used for acquiring a sample data set; the sample data set comprises a first set of real scene sample graphs; inputting a sample data set into a deep neural network to be trained for iterative training, determining a structural reconstruction loss value and a bilateral filtering smoothing loss value in each iteration, and determining a target loss value according to the structural reconstruction loss value and the bilateral filtering smoothing loss value; in each iteration, adjusting network model parameters according to a target loss value until the iteration is stopped to obtain a trained deep neural network; the network model parameters include parameters of the first generator; the trained deep neural network comprises a trained abstract network; the structure reconstruction loss value is used for representing the difference of the structure characteristics between the cartoon image generated by the first generator according to the real scene sample diagram and the real scene sample diagram; and the bilateral filtering smoothing loss value is used for determining the difference between adjacent pixels in the cartoon image generated by the first generator according to the pixel value similarity and the spatial position similarity.
In one embodiment, the abstract processing module is further configured to perform stylization on the abstract cartoon image through the first generator in the first stage to generate a stylized cartoon image with artistic style.
In one embodiment, the abstract network further comprises a first arbiter; the network model parameters also comprise parameters of a first discriminator; the sample data set further comprises a second set of abstract cartoon sample diagrams; the model training module is also used for determining a structural reconstruction loss value, a bilateral filtering smoothing loss value and a grid enhancement loss value in each iteration, and determining a target loss value according to the structural reconstruction loss value, the bilateral filtering smoothing loss value and the grid enhancement loss value; the style enhancement loss value is determined by a first discriminator according to a first probability of cartoon image output generated by a first generator; the first probability is used for representing the probability that the cartoon image generated by the first generator belongs to the style of the abstract cartoon sample graph.
In one embodiment, the model training module is further configured to obtain a third set of original cartoon images; the original cartoon images in the third set have the same artistic style; extracting contour edge lines of each original cartoon image according to a pre-trained line tracking network; generating an abstract cartoon sample picture according to the extracted contour edge lines and the original cartoon image to obtain a second set; the abstract cartoon sample picture is an abstract picture obtained by eliminating contour edge lines from an original cartoon image.
In one embodiment, the deep learning network is a two-stage neural network, and the abstract network is a first-stage abstract network in the two-stage neural network; the two-stage neural network also comprises a second-stage line drawing network; the line generation module is also used for inputting the style cartoon image generated in the first stage into a second generator in a second-stage line drawing network in the trained two-stage neural network in the second stage so as to perform contour edge line generation processing on the style cartoon image through the second generator and obtain a cartoon image after the cartoon image of the real scene is cartoon; the second-stage line drawing network is a deep neural network for generating contour edge lines in the second stage.
In one embodiment, the abstract cartoon sample map is an abstract image with contour edge lines removed from the original cartoon image; the sample data set further comprises a third set of original cartoon images corresponding to the abstract cartoon sample graph; the network model parameters further include parameters of the second generator; the model training module is further used for determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value in each iteration, and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value; and the edge line distribution loss value is used for representing the difference between the image with the line generated by taking the abstract cartoon sample diagram as the input of the second generator and the original cartoon image corresponding to the abstract cartoon sample diagram.
In one embodiment, the second stage line drawing network further comprises a second arbiter; the network model parameters also comprise parameters of a second discriminator; the model training module is also used for determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value, an edge line distribution loss value and an edge line enhancement loss value in each iteration; determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, the edge line distribution loss value and the edge line enhancement loss value; an edge line enhancement loss value determined by a second probability output by a second discriminator after inputting an image with lines generated by the second generator with the image generated by the first generator as input to the second discriminator; and the second probability is used for representing the line intensity difference between the image with the lines generated by the second generator and the original cartoon image.
In one embodiment, the model training module is further configured to obtain a first set of real scene sample images, a second set of abstract cartoon sample images, and a third set of original cartoon images corresponding to the abstract cartoon sample images; inputting the first set into an abstract network to be trained for structure reconstruction iterative training, determining a structure reconstruction loss value in each round of structure reconstruction iterative training, and adjusting parameters of a first generator according to the structure reconstruction loss value until an initialization training stop condition is reached to obtain an initialization abstract network; inputting the first set and the second set into an initialization abstract network, inputting the second set and the third set into a second-stage line drawing network to be trained for network stacking iterative training, and determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value in each iteration; determining a comprehensive loss value according to the structure reconstruction loss value, the bilateral filtering smooth loss value, the style enhancement loss value and the edge line distribution loss value, adjusting parameters of a first generator and a first discriminator of an initialized abstract network and parameters of a second generator and a second discriminator of a second-stage line drawing network to be trained according to the comprehensive loss value until iteration stops, and obtaining a trained two-stage neural network; the two-stage neural network comprises a first-stage abstract network and a second-stage line drawing network.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a real scene image;
performing image reconstruction processing and abstract smoothing processing on the real scene image based on semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain; the abstract cartoon image has a significant structure in the real scene image and lacks contour edge lines in the real scene image;
stylizing the abstract cartoon image to generate a style cartoon image with artistic style;
and generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a real scene image;
performing image reconstruction processing and abstract smoothing processing on the real scene image based on semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain; the abstract cartoon image has a significant structure in the real scene image and lacks contour edge lines in the real scene image;
stylizing the abstract cartoon image to generate a style cartoon image with artistic style;
and generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized.
The processing method, the device, the computer equipment and the storage medium for the cartoon of the real scene image deeply understand the key semantic information in the real scene image, perform image reconstruction processing and abstract smoothing processing on the real scene image based on the semantic information of the real scene image to obtain the abstract cartoon image of the real scene image mapped on the cartoon domain, and combine the image reconstruction and the abstract smoothing processing, so that the significance structure in the real scene image is maintained, and the irrelevant details can be smoothed, and the color and the image content are smooth. And then, stylizing the abstract cartoon image to generate a style cartoon image with artistic style. In other words, the image is abstracted and smoothed, and the content of the image is abstracted and stylized in the first stage, so that contour edge lines are generated based on the stylized cartoon image obtained in the first stage, unnecessary lines can be avoided from being induced in an insignificant area, the lines can be clearly concentrated on the contour edge of a significant structure area, and a cartoon image with higher image quality can be obtained.
Drawings
FIG. 1 is a diagram of an exemplary implementation of a method for cartoon processing of images of a real scene;
FIG. 2 is a flowchart illustrating a method for processing cartoon images of a real scene according to an embodiment;
FIG. 3 is a schematic diagram of a network architecture for a two-stage neural network in one embodiment;
FIG. 4 is a schematic representation of a cartoon image of an embodiment;
FIG. 5 is a graph illustrating the results of various states of one embodiment;
FIG. 6 is a graph showing the effect of the comparison between the present disclosure and the reference set according to one embodiment;
FIG. 7 is a comparison of different styles of cartoon images in one embodiment;
FIG. 8 is a block diagram showing a processing apparatus for cartoon real scene images in one embodiment;
FIG. 9 is a block diagram showing a processing apparatus for cartoon representation of an image of a real scene in another embodiment;
FIG. 10 is a diagram showing an internal structure of a computer device in one embodiment;
fig. 11 is an internal configuration diagram of a computer device in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method for processing the cartoon of the real scene image can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
The terminal 102 may obtain a real scene image input by a user or acquired by the terminal, send the real scene image to the server 104, and the server 104 executes the method for processing the cartoon real scene image in the embodiments of the present application. The server 104 may output the generated cartoon image to the terminal 102 for presentation.
It can be understood that the terminal 102 itself may also have a processing method for performing cartoon processing of the real scene image in the embodiments of the present application, and then the terminal 102 may directly convert the acquired real scene image into a cartoon image and perform output display. The execution subject of the method in the embodiments of the present application is not limited herein, that is, both the terminal 102 and the server 104 may execute.
It can be understood that the method for processing the cartoon real scene image in the embodiments of the present application is equivalent to using an artificial intelligence technology to automatically convert the real scene image into the cartoon image.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
It can be understood that, in the processing method for cartoon real scene images in the embodiments of the present application, a computer vision technology in an artificial intelligence technology is used to perform image processing such as abstract smoothing processing, stylization processing, edge line generation and the like on real scene images, so as to automatically convert the real scene images into cartoon images, thereby implementing cartoon real scene image processing.
Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
In addition, the processing method for cartoon real scene images in the embodiments of the application also uses a machine learning processing technology in an artificial intelligence technology. It can be understood that, in the embodiment of the present application, it is equivalent to use a machine learning technique to understand the image semantics and perform image reconstruction and abstract smoothing processing. In addition, in some embodiments of the present application, the training and using processes of the deep learning network are related to, and sufficient description is provided, so that the present application realizes cartoon processing of real scene images based on machine learning.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
In one embodiment, as shown in fig. 2, a method for processing cartoon of real scene image is provided, which is described by taking the method as an example applied to a computer device, where the computer device may be the server or the terminal in fig. 1, and includes the following steps:
step 202, acquiring a real scene image.
The real scene image refers to an image in a real scene.
In one embodiment, the image content of the real scene image may include at least one of an environment image content and an object image content of the real scene.
It is understood that ambient image content is used to represent image content of the background environment in which the object is located. The object may be at least one of a human, an animal, an object, and the like. Therefore, the object image content may be at least one of human image content, animal image content, and object image content.
In one embodiment, the real scene image may be a single picture or an image in a sequence of video frames.
In one embodiment, the sequence of video frames may be a sequence of image frames in a video file that has been generated, i.e., a real scene image, may be a frame-by-frame image in the video file. Thereby realizing video cartoon.
In one embodiment, the sequence of video frames may also be a sequence of image frames in a real-time video stream. I.e. the real scene image, may be a frame-by-frame image in a real-time video stream. Therefore, real-time video cartoon is realized.
The method in each embodiment of the application is a method for carrying out cartoon processing on the real scene image so as to convert the real scene image into the cartoon image. The cartoon image is a non-real image which is reproduced by extracting characteristic elements from a natural real prototype (i.e., a real scene image) and using an artistic method.
Step 204, based on the semantic information of the real scene image, performing image reconstruction processing and abstract smoothing processing on the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain; the abstract cartoon image has a significant structure in the real scene image and lacks contour edge lines in the real scene image.
The semantic information is feature information of the image.
The image reconstruction processing is a processing procedure for reconstructing a generated image.
And abstract smoothing processing, namely filtering processing, is used for eliminating details which are not important in the image, and abstracting the image content into a cartoon image. It can be understood that since the real scene image includes rich image details, which should not be displayed in the cartoon image, the abstract smoothing process is performed to eliminate unnecessary interference and realize the smoothing of the distribution of colors and the like.
The salient structure is the structural feature of a key area in a real scene image. I.e. key image content in the real scene. The contour edge line refers to an edge contour line of image content in a real scene image. It will be appreciated that each region in the image of the real scene has image content, and each portion of image content has a corresponding edge contour.
The abstract cartoon image is a non-real cartoon image which is used for mapping a real scene image on a real domain to a cartoon domain. It can be understood that, in the embodiment of the present application, the image reconstruction processing is performed on the real scene image based on the semantic information of the real scene image, which is equivalent to performing the image reconstruction in consideration of the key features in the real scene image, so that the generated abstract cartoon image has a significant structure in the real scene image, and moreover, unnecessary details are removed by combining the image reconstruction and the abstract smoothing processing, so that the abstract cartoon image lacks unnecessary details such as contour edge lines in the real scene image, which becomes smoother.
Specifically, the computer device may input the real scene image to a pre-trained deep neural network to perform semantic extraction processing on the real scene image, perform image reconstruction processing on the real scene image based on the extracted semantic information, and then perform abstract smoothing processing on the reconstructed image to obtain an abstract cartoon image of the real scene image mapped on the cartoon domain.
In one embodiment, the pre-trained deep neural network may include at least two stages of neural networks. Therefore, the computer device may input the real scene image into the neural network of the first stage in the deep neural network. It is understood that in other embodiments, the deep neural network may be only a single stage neural network.
And step 206, stylizing the abstract cartoon image to generate a style cartoon image with artistic style.
The stylization processing refers to processing for giving an artistic style to an abstract cartoon image.
The artistic style refers to a comprehensive overall characteristic expressed in the literature and art creation. I.e. a representative appearance of the artwork as a whole. The stylistic cartoon image adds artistic style to the abstract cartoon image in step 204.
It is understood that different artists have different artistic styles, and therefore, by stylizing the abstract cartoon image, a stylized cartoon image similar to the creation style of a certain artist can be generated. For example, the abstract cartoon image may be stylized as a stylized cartoon image similar to the authoring style of courier.
It should be noted that the conversion of the abstract cartoon image into a style cartoon image with a fixed artistic style is not limited herein. In fact, the abstract cartoon image can be stylized into stylized cartoon images with different artistic styles respectively as required.
In one embodiment, the computer device may also stylize the abstract cartoon image using the deep neural network of step 204 to generate a stylized cartoon image having an artistic style.
In another embodiment, the computer device may also use a preset stylized template to perform image superposition and fusion processing on the abstract cartoon image and the stylized template to generate a stylized cartoon image with artistic style. For example, an artistic style can be added to the abstract cartoon image by adding a filter (i.e., a stylized template).
It will be appreciated that stylized cartoon images add artistic stylistic information as compared to abstract cartoon images, but still lack silhouette edge lines.
And 208, generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized.
The cartoon image after cartoon of the real scene image is a cartoon image which has contour edge lines, has a significant structure in the real scene image and has an artistic style.
Specifically, the computer device may perform edge line generation processing on the style cartoon image to generate a contour edge line of the style cartoon image, so as to obtain the cartoon image after the cartoon image of the real scene is cartoonized.
In one embodiment, when the deep neural network in step 204 includes at least two stages of neural networks, the computer device may input the real scene image into the neural network of the second stage in the deep neural network to perform edge line generation processing on the style cartoon image.
In other embodiments, the computer device may also train a machine learning model that is dedicated to generating edge lines (i.e., independent of the machine learning model of step 204) to generate contour edge lines for stylized cartoon images. In addition, the computer equipment can also use an edge detection operator to search the image contour edge in the style cartoon image and generate a contour edge line, so that the cartoon image after the cartoon image of the real scene is cartoon-ized is obtained.
The method for processing the cartoon of the real scene image deeply understands the key semantic information in the real scene image, performs image reconstruction processing and abstract smoothing processing on the real scene image based on the semantic information of the real scene image to obtain the abstract cartoon image of the real scene image mapped on the cartoon domain, and combines the image reconstruction processing and the abstract smoothing processing, so that the method not only keeps the significance structure in the real scene image, but also can smooth out the irrelevant details to ensure that the color and the image content are smooth. And then, stylizing the abstract cartoon image to generate a style cartoon image with artistic style. The method is equivalent to the steps of carrying out abstract smoothing processing (for smoothing the non-salient structure) on the image and carrying out abstract and stylization processing on the image content in the first stage, so that contour edge lines are generated on the basis of the stylized cartoon image obtained in the first stage, unnecessary lines can be avoided being urged to be generated in the non-salient region, the lines can be clearly concentrated on the contour edges of the salient structure region, and the cartoon image with higher image quality can be obtained.
In one embodiment, the real scene image is extracted from a sequence of video frames in sequence; the video frame sequence comprises at least two real scene images which are arranged in sequence.
It will be appreciated that the sequence of video frames may be a sequence of image frames in a video file, or may be a sequence of image frames in a real-time video stream.
Then, according to the method in the embodiments of the present application, a corresponding cartoon image can be generated for each real scene image in the sequence of video frames.
In one embodiment, the method further comprises: and when the video frame sequence is an image frame sequence in the video file, generating a cartoon video file according to the cartoon image cartoon after cartoon of each real scene image in the video frame sequence.
Specifically, when the sequence of video frames is a sequence of image frames in a video file, the computer device may perform the method in the embodiments of the present application on each real scene image in the sequence of video frames in order to convert the real scene image into a cartoon image. And then, generating a cartoon video file according to each cartoon image subjected to cartoon-passing of the real scene image.
It can be understood that this way is equivalent to generating a corresponding cartoonized video file by the method in the embodiment of the present application before playing the video file, and then directly playing the cartoonized video file.
In one embodiment, the method further comprises: and when the video frame sequence is an image frame sequence in a video file, acquiring the cartoon images corresponding to the real scene images in the video frame sequence in real time and outputting the cartoon images when the video file is played.
Specifically, in this embodiment, after each real scene image in the video frame sequence is converted into a corresponding cartoon image, the computer device may not generate a separate cartoon video file, but record the corresponding relationship between the real scene image and the cartoon image. Then, when the video file is played subsequently, that is, the video frame sequence corresponding to the video file is played, the computer device may output the cartoon image cartoon-transformed with each real scene image in the video frame sequence in real time according to the corresponding relationship.
In an embodiment, when the computer device is a terminal, then, after obtaining that the user triggers the cartoon play mode on the video file, the terminal may display, in real time, a cartoon image cartoonized with each real scene image in the sequence of video frames. When the cartoon play mode triggered to the video file is not acquired, namely in the normal play mode of the video file, the computer equipment can directly play the video file. It can be understood that the embodiment can realize two play modes (i.e. a normal play mode and a cartoon play mode) for one video file.
When the computer equipment is a server, the cartoon images corresponding to the real scene images in each video frame sequence in the played video file are output to the terminal from the server in real time when the video file can be played on the terminal, and then the terminal performs real-time display of the cartoon images.
In one embodiment, the method further comprises: and when the video frame sequence is an image frame sequence in the real-time video stream, outputting a cartoon video stream in real time according to the cartoon image after cartoon of the real scene image in the video frame sequence.
It can be understood that, when the sequence of video frames is a sequence of image frames in a real-time video stream, the computer device may execute the method in the embodiments of the present application on each real scene image in the video stream when acquiring the real scene image, so as to convert the real scene image into a corresponding cartoon image, and display the cartoon image in real time. Therefore, the video stream sending end sends the video stream of the real scene image in real time, and the display end displays the cartoon image which is cartoon-processed of the real scene image in real time, namely the cartoon-processed video stream is output in real time.
It can be understood that, when the computer device is a terminal, outputting the cartoonized video stream in real time refers to outputting the display cartoonized video stream in real time. When the computer equipment is a server, outputting the cartoon video stream in real time refers to outputting the cartoon video stream to the terminal in real time so that the terminal can display the cartoon video stream in real time.
Some scenarios are now exemplified. For example, in a live broadcast scene, a live broadcast sending end acquires a real scene image in a live broadcast video stream, and then sends the real scene image to a server, the server carries out cartoon processing on the real scene image to generate a cartoon image, and the cartoon image is sent to a live broadcast receiving end in a streaming manner in real time, so that the real scene where a main broadcast is located is converted into a cartoon scene and presented to a user watching the live broadcast. For another example, in a video call scene in instant messaging, real scene images of both video call parties can be converted into cartoon images to be displayed in real time in a video streaming mode, so that both video call parties can see cartoon dynamic scenes. For another example, when a video is watched on line on some video content platforms, for an original online video composed of real scene images, a cartoon play mode can be selected by a user, and a cartoon video stream can be output in real time, so that a cartoon image corresponding to the real scene image is played.
In the embodiment, the video file or the video stream in the real scene can be converted into the cartoon video file or the cartoon video stream, so that the diversity of the video scene is realized. In addition, the generated cartoon video file or cartoon video stream has high-quality cartoon images, and the picture quality of the video file or video stream is improved.
In one embodiment, the image reconstruction processing and the abstract smoothing processing are performed on the real scene image based on the semantic information of the real scene image, and obtaining the abstract cartoon image of the real scene image mapped on the cartoon domain includes: inputting the real scene image into the trained deep neural network to perform semantic extraction on the real scene image, performing image reconstruction processing on the real scene image based on the extracted semantic information, and performing bilateral filtering smoothing processing on the reconstructed image content to generate an abstract cartoon image.
The deep neural network is used for extracting semantic information of a real scene image and performing image reconstruction processing and bilateral filtering smoothing processing based on the semantic information. Bilateral filtering smoothing refers to filtering abstract smoothing processing on an image from two dimensions, namely a pixel value and a spatial position.
It should be noted that the deep neural network is not limited to be capable of implementing only the above-described processing, and may implement other processing.
The deep neural network may be a single neural network or a multi-stage neural network. It is understood that a multi-stage neural network includes at least two stages of neural networks.
When the deep neural network is an independent neural network, the computer device may separately iteratively train the deep neural network in advance according to the sample data set to obtain a final deep neural network. When the deep neural network is a multi-stage neural network, the computer device may perform iterative training on the neural networks of the stages included in the multi-stage neural network together, thereby implementing training on the multi-stage deep neural network.
In the embodiment, through the deep neural network, in the first stage, the key semantic information in the real scene image is deeply understood, the image reconstruction processing and the abstract smoothing processing are performed on the real scene image based on the semantic information of the real scene image, the abstract cartoon image of the real scene image mapped on the cartoon domain is obtained, the image reconstruction and the bilateral filtering smoothing processing are combined, the significance structure in the real scene image is kept, the irrelevant details can be smoothed, and the image processing accuracy is improved. Moreover, bilateral filtering can perform filtering abstract smoothing processing on the image from two dimensions of a pixel value and a spatial position, so that the accuracy of the abstract smoothing processing and the image processing quality are improved. Subsequently, a high quality cartoon image can be generated based on the abstract cartoon image generated in the first stage.
In one embodiment, the real scene image is input into a first generator of an abstract network in a deep neural network. In this embodiment, the training step of the deep neural network includes: acquiring a sample data set; the sample data set comprises a first set of real scene sample graphs; inputting a sample data set into a deep neural network to be trained for iterative training, determining a structure reconstruction loss value and a bilateral filtering smooth loss value in each iteration, determining a target loss value according to the structure reconstruction loss value and the bilateral filtering smooth loss value, and adjusting network model parameters in each iteration according to the target loss value until the iteration is stopped to obtain the trained deep neural network; the network model parameters include parameters of the first generator; the trained deep neural network comprises a trained abstract network.
It is understood that in the embodiment of the present application, the deep neural network includes an abstract network, the abstract network includes a first generator, and the computer device inputs the real scene image into the first generator to generate the abstract cartoon image. Namely, the first generator of the abstract network is used for extracting and carrying out image reconstruction and bilateral filtering smoothing processing based on the semantic information of the real scene image, thereby generating the abstract cartoon image.
The abstract cartoon image may be the final output data of the first generator or an intermediate result generated by the first generator, i.e. the first generator may further process the abstract cartoon image and then output the abstract cartoon image accordingly.
The deep neural network is a pre-trained neural network. The sample data set and specific training steps for training the deep neural network will be described below.
And the sample data set is a training set for training the deep neural network. The sample data set includes a first set of real scene sample graphs, but is not limited to only include the first set, and may include other sets of sample data. That is, the sample data set may include a plurality of different types of sample sets, where the first set is a type of sample set (i.e., a set of real scene sample graphs).
In one embodiment, the first set may include an original real scene sample map, and may also include a real scene sample map preprocessed with respect to the original real scene sample map.
Specifically, the computer device may obtain an original real scene sample map, randomly cut the original real scene sample map into an image with a preset size, adjust the image to a preset resolution, use the image after the size adjustment and the resolution adjustment as a final real scene sample map, further form a first set, and input the first set into the deep neural network for training.
For example, 5000 real-world photos (i.e., original real scene sample images) are collected. 5000 real-world photos are randomly cut into small blocks with preset sizes (for example, the size is 500 × 500), and then the resolution is adjusted to be 256 × 256, so that the adjusted images are the first set, namely the set of real scene sample images. The adjusted images (i.e., the first set) may then be input into a deep learning network to perform training. It can be understood that the original real scene sample graph is preprocessed through size adjustment and resolution adjustment and then is regenerated into the first set, so that the accuracy of subsequent deep learning network training can be improved, and the data processing pressure can be reduced.
The target loss value refers to a final loss value in each iteration when the deep neural network is iteratively trained.
In one embodiment, the computer device may obtain a structural reconstruction loss function and a bilateral filtering smoothing loss function designed in advance for the abstract network, further construct a target loss function according to the structural reconstruction loss function and the bilateral filtering smoothing loss function, and input the sample data set into the deep neural network to be trained for iterative training. And adjusting the network model parameters of the deep neural network in each iteration to minimize the value of the target loss function until the iteration is stopped, so as to obtain the trained deep learning network.
It can be understood that, in each iteration, the computer device may calculate a structure reconstruction loss value and a bilateral filtering smoothing loss value according to a structure reconstruction loss function and a bilateral filtering smoothing loss function in the target loss function, determine a target loss value corresponding to each iteration according to the structure reconstruction loss value and the bilateral filtering smoothing loss value, and adjust a network model parameter of the deep neural network according to the target loss value to find a minimized target loss value, so that the deep neural network gradually tends to converge until the iteration is stopped, thereby obtaining a trained deep learning network.
It should be noted that supervised iterative training is performed according to the structural reconstruction loss function, and unsupervised iterative training is performed according to the bilateral filtering smoothing loss function. Namely, the image content reconstruction and abstract smoothing processing are carried out by using the supervised structural reconstruction loss and the unsupervised bilateral filtering smoothing loss.
It can be understood that, because the deep neural network includes the abstract network, the training of the abstract network is necessarily involved in the training process of the deep neural network, and the adjustment of the parameters of the first generator is necessarily involved in the adjustment of the parameters of the network model. Therefore, the trained deep neural network includes the trained abstract network.
The target loss value is not limited to be determined only from the structure reconstruction loss value and the bilateral filtering smoothing loss value. When the deep neural network further comprises other networks or has other processing, the target loss value can be determined together with other loss values.
The structure reconstruction loss value is used for representing the difference of the structure characteristics between the cartoon image generated by the first generator according to the real scene sample diagram and the real scene sample diagram. It is understood that the structural reconstruction loss function is considered when constructing the target loss function, so as to limit the structural similarity between the input real scene image and the cartoon image output by the first generator, i.e., to enable the cartoon image generated by the first generator to retain the significant structure (i.e., retain the key structural features) in the input real scene sample map.
In one embodiment, the structural reconstruction loss function for an abstract network design is as follows:
Lssc=∑(1+ρB)||G(I)-I||2
wherein L issscReconstructing a loss function for the structure; g (i) represents the cartoon image generated by the first generator (i.e., the predicted abstract cartoon image). I represents a sample diagram of the input real scene. B represents a binary mask, where 1 represents the structure edge, and otherwise 0, where ρ is the weight of the significant edge. It is understood that the binary mask may be generated by using a pre-trained line tracing network, that is, extracting the structural edge lines of the real scene sample image by the line tracing network, and then converting the obtained line image into the binary mask.
And the bilateral filtering smoothing loss value is used for determining the difference between adjacent pixels in the cartoon image generated by the first generator according to the pixel value similarity and the spatial position similarity. It can be understood that the bilateral filtering smoothing loss function is considered when constructing the target loss function, so as to comprehensively consider the similarity of pixel values and the similarity of spatial positions to perform abstract smoothing processing on the image. That is, the image is subjected to filter abstract smoothing processing from two dimensions, i.e., pixel values and spatial positions.
In particular, the computer device may obtain a pre-designed bilateral filtering smoothing loss function. Then, in each iteration process, a bilateral filtering smoothing loss function can be combined, and comprehensive consideration is carried out from two aspects of pixel value similarity and spatial position similarity, so that a bilateral filtering smoothing loss value is obtained. It will be appreciated that the bilateral filtering smoothing loss value can be used to characterize the difference between each pixel and its neighbors in the cartoon image generated by the first generator. Thus, it is equivalent to determine the difference between each pixel and its neighboring pixels in the cartoon image generated by the first generator by taking a comprehensive consideration of the similarity of pixel values and the similarity of spatial positions.
In one embodiment, the bilateral filtering smoothing loss function includes two kernel functions, a spatial kernel and a pixel-wide kernel. Namely, the final weighting coefficient function of the bilateral filtering smoothing loss function is determined by the space kernel and the pixel range kernel. The spatial kernel is a function of the weight coefficients of the spatial domain, and is used to smooth the size of the spatial neighborhood of the pixel, i.e., determine the weight coefficients from spatial location similarity. The pixel range kernel is a function of the weighting coefficients of the pixel range domain, and is used for smoothing the size of the pixel value (color difference) between adjacent pixels, i.e., determining the weighting coefficients from the similarity of the pixel values. Therefore, the final weight coefficient function of the bilateral filtering smoothing loss function is a weight coefficient function taking into account the comprehensiveness of the similarity of pixel values and the similarity of spatial positions. Therefore, when the bilateral filtering smoothing loss function is used for calculating the bilateral filtering smoothing loss value, the two aspects of pixel value similarity and spatial position similarity can be considered comprehensively,
in one embodiment, the bilateral filtering smoothing loss function LflaThe formula of (1) is as follows:
Figure BDA0002504108420000181
wherein L isflaSmoothing loss function for bilateral filtering; n is the total number of pixels, NS(i) J represents a neighbor pixel of pixel i | · that is contiguous with H × HzIndicating an L2 regularization operation without square-off. Wherein
Figure BDA0002504108420000182
Is defined as
Figure BDA0002504108420000183
A weight coefficient function representing a spatial domain;
Figure BDA0002504108420000184
is defined as
Figure BDA0002504108420000185
A weight coefficient function representing a pixel range domain. σ represents the standard deviation of the gaussian kernel function. SigmacoRepresenting the standard deviation, σ, of the spatial kernel functionspRepresenting the standard deviation of the pixel range kernel. C denotes a color channel, and x and y denote spatial position coordinates of the pixels. x is the number ofi,yiIs then the spatial position coordinate, x, of pixel ij,yjThe spatial location coordinate of pixel j. I represents a sample diagram of the input real scene. I isi,c-Ij,cRepresenting the difference in pixel values between a pixel I in I and its neighboring pixel j. G (I)iRepresenting pixels i, G (I) in a cartoon image generated by a first generatorjRepresenting pixel j in the cartoon image generated by the first generator.
In one embodiment, σcoAnd σspMay be set to 0.2 and 5, respectively.
In the above embodiment, the supervised structural reconstruction loss and the unsupervised bilateral filtering smoothing loss are combined, so as to perform image content reconstruction and abstract smoothing processing training to obtain a deep neural network through training, and then, based on the trained deep neural network, high-quality image content reconstruction and abstract smoothing processing can be performed, so as to improve the subsequent image generation quality.
In one embodiment, stylizing the abstract cartoon image to generate a stylized cartoon image having an artistic style comprises: in the first stage, the abstract cartoon image is stylized through a first generator to generate a style cartoon image with artistic style.
It is understood that the first generator can further stylize the abstract cartoon image to generate a stylized cartoon image having an artistic style, in addition to generating the abstract cartoon image.
In the present embodiment, since the first generator can perform the stylization processing, it is inevitable that the stylization-related training is performed also during the training. That is, the abstract network is pre-stylized by performing a style migration training process, thereby enabling its first generator to perform a stylization process.
In one embodiment, the abstract network further comprises a first arbiter. In the embodiment of the present application, the abstract network is a generative confrontation network (GAN model) including a first generator and a first discriminator. Therefore, the network model parameters of the deep neural network further include parameters of the first discriminator.
In this embodiment, the sample data set further includes a second set of abstract cartoon sample diagrams. The abstract cartoon sample graph is an abstract image with missing contour edge lines. It will be appreciated that the abstract cartoon sample drawings in the second collection all have the same artistic style.
It will be appreciated that the purpose of the first discriminator is to distinguish the cartoon image generated by the first generator from the style of an abstract cartoon sample map lacking the silhouette edge lines. The first generator and the first discriminator are antagonistic, and the generation of the cartoon image which cannot be distinguished by the first discriminator needs to be iteratively resisted, so that the generated cartoon image is gradually close to the style of the abstract cartoon sample diagram, and the style characteristics of the abstract cartoon sample diagram are introduced into the cartoon image generated by the first generator.
In one embodiment, determining a structural reconstruction loss value and a bilateral filtering smoothing loss value in each iteration, and determining a target loss value based on the structural reconstruction loss value and the bilateral filtering smoothing loss value comprises: in each iteration, determining a structural reconstruction loss value, a bilateral filtering smoothing loss value and a grid enhancement loss value, and determining a target loss value according to the structural reconstruction loss value, the bilateral filtering smoothing loss value and the grid enhancement loss value.
Specifically, the computer device may obtain a pre-designed style enhancement loss function, and construct a target loss function according to the structure reconstruction loss function, the bilateral filtering smoothing loss function, and the style enhancement loss function. It will be appreciated that the style enhancement loss function is considered in constructing the target loss function in order for the cartoon image generated by the first generator to have the style of an abstract cartoon sample map, i.e., to enable the first generator to generate a cartoon image having the style of an abstract cartoon sample map.
When training the deep neural network, the computer device may input a sample data set including a first set and a second set into the deep neural network together for iterative training, randomly select a real scene sample diagram from the first set in each iteration, randomly select a preset number (for example, 5) of abstract cartoon sample diagrams from the second set as a reference, combine with a target loss function to obtain a structural reconstruction loss value, a bilateral filtering smoothing loss value, and a style enhancement loss value corresponding to the current iteration, further determine a final target loss value according to the structural reconstruction loss value, the bilateral filtering smoothing loss value, and the style enhancement loss value, thereby iteratively adjusting network model parameters according to the target loss value, wherein the network model parameters include parameters of a first generator and parameters of a first discriminator, until the iteration stops. Likewise, the target loss value is not limited and may include other loss values. That is, the target loss function may also include other loss functions.
It is to be understood that the stylistic enhancement loss value is determined by a first discriminator based on a first probability of a cartoon image output generated by a first generator. That is, in each iteration, the cartoon image generated by the first generator in the current iteration may be input to the first discriminator, and the first discriminator may output a probability value, that is, a first probability, according to the input cartoon image, and further determine the style enhancement loss value according to the first probability. The first probability is used for representing the probability that the cartoon image generated by the first generator belongs to the style of the abstract cartoon sample graph. It is understood that the greater the first probability, the closer the cartoon image generated by the first generator is to the style of the abstract cartoon sample map, the harder it is for the first discriminator to recognize.
In one embodiment, the penalty function (L) of the first discriminatorsty-D) The formula of (1) is as follows:
Lsty-D=∑log D(Ay)+log(1-D(G(I)));
style enhancement loss function L of first generatorstyThe formula of (1) is as follows:
Figure BDA0002504108420000201
wherein A isy={ayI 1, …, Y ∈ A.A is the second set, ayNamely the Yth abstract cartoon sample graph. I and G (I) respectively represent an input real scene sample graph and a cartoon image generated by the first generator. D is a first discriminator; d () is the output result of the first discriminator; d (A)y) To get the Yth abstract cartoon sample diagram AyAnd D (G) (I)) is the first probability of the cartoon image generated by the first generator being input into the first discriminator.
It should be noted that, a structure reconstruction loss value is determined in a supervised manner, a bilateral filtering smoothing loss value and a style enhancement loss value are determined in an unsupervised manner, so that image content reconstruction and abstract smoothing processing are performed based on the supervised structure reconstruction loss value and the unsupervised bilateral filtering smoothing loss value, stylized migration is performed by using unsupervised generation type countermeasure loss, and thus, in the first stage, a high-quality stylized cartoon image with abstract smoothing is generated.
In one embodiment, the obtaining of the second set comprises: acquiring a third set of original cartoon images; extracting contour edge lines of each original cartoon image according to a pre-trained line tracking network; generating an abstract cartoon sample picture according to the extracted contour edge lines and the original cartoon image to obtain a second set; the abstract cartoon sample picture is an abstract picture obtained by eliminating contour edge lines from an original cartoon image.
Wherein the original cartoon images in the third set have the same artistic style. The abstract cartoon sample drawings in the second collection then also have the same artistic style as the original cartoon images in the third collection.
It should be noted that, according to needs, an original cartoon image with a desired artistic style may be prepared to obtain a third set, so that an abstract cartoon sample drawing belonging to the artistic style is obtained based on the third set, and the abstract cartoon sample drawing is trained as the second set, so that the desired cartoon image with the artistic style may be flexibly generated. Namely, cartoon images with different artistic styles can be generated according to training data with different artistic styles.
For example, assuming that a style a is desired to be generated, a third set of styles a may be prepared and trained to obtain a second set of styles a, and the trained first generator may convert the real scene images into cartoon images of style a. Assuming that the B style is desired to be generated, a third set of B styles may be prepared, and trained to obtain a second set of B styles, and the trained first generator may convert the real scene images into B-style cartoon images.
The line tracking network is a depth neural network which is trained in advance and used for extracting edge lines. That is, edge lines for extracting a salient region of an image, and unnecessary lines for a non-salient region are ignored.
In one embodiment, the line tracing network, which may be a fully convolutional deep neural network, includes an encoder and a decoder.
Specifically, the computer may train a line tracking network in advance, input a preset third set of original cartoon images into the line tracking network, and extract contour edge lines of each original cartoon image in the third set by the line tracking network. A computer device may convert the line image to a binary mask. All edge pixels in the binary mask are assigned a 1, otherwise they are assigned a 0. Further, the computer device may use a gaussian filter (such as a gaussian filter with a standard layer of 5) to generate a gaussian-blurred version of the cartoon image. The computer device may then multiply the gaussian blurred version of the cartoon image with a binary mask to obtain a blurred edge region. The computer device may replace the corresponding edge region in the original cartoon image with the blurred edge region, thereby obtaining an abstract cartoon sample map with missing edge lines, thereby obtaining a second set. It can be understood that the original cartoon image and the abstract cartoon sample image generated by converting the original cartoon image correspond to each other one by one.
In one embodiment, the training step of the line tracing network comprises: obtaining a line trace dataset; the line tracking data set comprises a plurality of groups of cartoon line image pairs, and each group of cartoon line image pairs comprises a sample cartoon image and line images of all sample cartoon images; and (4) according to the line tracking data set, carrying out supervised iterative training on the line tracking network to obtain a final line tracking network.
It will be appreciated that the line tracing network may be supervised training using the L1 norm loss function. To enhance the diversity of the data and tolerance to background regions, some background clutter images may be included in the line tracing dataset, making the line tracing network focus on important structural edges. The method has the advantages that the network is traced through line changing, and contour edge lines of cartoon images in various styles can be extracted.
In the embodiment, the contour edge lines of the original cartoon images can be accurately extracted through the pre-trained line tracking network, so that an abstract cartoon sample image with higher reference value can be generated according to the extracted contour edge lines and the original cartoon images. Then, a more accurate deep neural network can be trained based on the abstract cartoon sample graph, and a cartoon image with higher quality can be generated based on the deep neural network.
In one embodiment, the deep learning network is a two-stage neural network, and the abstract network is a first-stage abstract network in the two-stage neural network; the two-stage neural network also comprises a second-stage line drawing network. In this embodiment, the step 206 of generating contour edge lines of the style cartoon image to obtain the cartoon image cartoon with the real scene image cartoon includes: in the second stage, inputting the style cartoon image generated in the first stage into a second generator in a second-stage line drawing network in the trained two-stage neural network, and performing contour edge line generation processing on the style cartoon image through the second generator to obtain a cartoon image after cartoon of the real scene image is cartoonized; the second-stage line drawing network is a deep neural network for generating contour edge lines in the second stage.
Wherein the two-stage neural network comprises a two-stage neural network. It is understood that neural networks of different stages are used to perform different processes, and the output result of the neural network of the previous stage is used as the input data of the neural network of the subsequent stage. It should be noted that, in the training process of the two-stage neural network, the input at different stages may be different or the same type of sample sets in the sample data set.
And the two-stage neural network comprises a first-stage abstract network and a second-stage line drawing network. The first-stage abstract network is a deep neural network which performs abstract smoothing processing and stylizing processing in the first stage. And a line drawing network at the second stage is a deep neural network for describing the contour edge lines of the image output at the first stage. Namely, the image output in the first stage is input to a second generator in the second stage line drawing network, so that the second generator performs contour edge line generation processing on the style cartoon image, and the final cartoon image after the cartoon image of the real scene is output.
In the above embodiment, based on the two-stage neural network, the image is abstractly smoothed and stylized (i.e., performed in the first stage) and the edge generation processing (i.e., performed in the second stage) is performed in stages, and the contour edge lines are added to the result output in the first stage, so that lines are prevented from being generated in an area needing smoothing, and harmony and unity of line feeling and image content smoothing are achieved.
In one embodiment, the abstract cartoon sample map is an abstract image with contour edge lines removed from the original cartoon image; the sample data set also includes a third set of original cartoon images corresponding to the abstract cartoon sample graph. In this embodiment, in each iteration, determining a structural reconstruction loss value, a bilateral filtering smoothing loss value, and a style enhancement loss value, and determining a target loss value according to the structural reconstruction loss value, the bilateral filtering smoothing loss value, and the style enhancement loss value includes: in each iteration, determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value, and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value.
It is understood that the computer device may obtain a pre-designed edge line distribution penalty function. That is, the target loss function also includes an edge line distribution loss function.
Specifically, in each iteration training, the computer device may input the abstract cartoon sample diagram into a second generator of the line description network to be trained, so as to perform supervised iteration training on the line description network, and in each iteration, according to the edge line distribution loss function, determine a difference between an image with a line generated by the second generator and an original cartoon image corresponding to the abstract cartoon sample diagram after the abstract cartoon sample diagram is input into the second generator, thereby obtaining an edge line distribution loss value. That is, the edge line distribution loss value is used to represent the difference between the image with the line generated when the abstract cartoon sample diagram is used as the input of the second generator and the original cartoon image corresponding to the abstract cartoon sample diagram.
The computer device can determine a target loss value corresponding to the iteration according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value. Thereby adjusting network model parameters of the two-stage neural network according to the target loss value. The network model parameters also comprise parameters corresponding to the first generator, the first discriminator and the second generator respectively.
In one embodiment, the edge line distribution penalty function LeasThe formula of (1) is as follows:
Figure BDA0002504108420000241
wherein, L (A)y) Representing an abstract cartoon sample diagram AyAn image with lines generated by a second generator as input; cGTSample drawing A of a representation abstract cartoonyThe corresponding original cartoon image.
It should be noted that only in the training phase, the abstract cartoon sample graph is drawn by lines, so that supervised iterative training is performed. When the line drawing network is used after being trained, only the image output by the first generator in the first stage is taken as an input, and the line drawing of the abstract cartoon sample graph does not exist.
In the embodiment, supervised line drawing training is performed in the second stage, and the second-stage line drawing network for generating the contour edge line can be accurately trained, so that when a network model is used, the result output by the first stage can be accurately drawn and the contour edge line can be accurately drawn based on the second-stage line drawing network, the line sense of the cartoon image is realized, and messy and redundant lines are avoided.
In one embodiment, the second stage line drawing network further comprises a second arbiter; the network model parameters further include parameters of a second discriminator.
It is to be understood that, in the present embodiment, the second-stage line drawing network is a generative countermeasure network including a second generator and a second discriminator. Then, the network model parameters of the two-stage neural network further include the parameters of the second discriminator.
The purpose of the second discriminator is to distinguish the image with lines generated by the second generator from the line intensities of the original cartoon image. The second generator and the second discriminator are antagonistic, and the image with the line which cannot be distinguished by the second discriminator is generated by iterative antagonism, so that the line intensity of the generated image and the corresponding original cartoon image is gradually close to each other, and the generated line is enhanced.
It is understood that the image with lines generated by the second generator and the abstract cartoon sample graph of the true contour edge lines are both false sample data, and the original cartoon image is the true sample data. Thereby guiding the second generator to generate an image after the enhancement line that cannot be distinguished by the second discriminator.
When the second generator carries out the confrontation training with the second discriminator, the image output by the first generator is input into the second generator, and the unsupervised iterative training is carried out.
In one embodiment, in each iteration, determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value, and an edge line distribution loss value, and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, and the edge line distribution loss value includes: in each iteration, determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value, an edge line distribution loss value and an edge line enhancement loss value; and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, the edge line distribution loss value and the edge line enhancement loss value.
Specifically, the computer device may obtain a pre-designed edge line enhancement loss function, and construct a target loss function according to a structure reconstruction loss function, a bilateral filtering smoothing loss function, a style enhancement loss function, an edge line distribution loss function, and an edge line enhancement loss function. It will be appreciated that the edge line enhancement loss function is considered in constructing the target loss function in order to approximate the intensity of the lines of the image generated by the second generator to the intensity of the lines of the original cartoon image.
Then, when the two-stage neural network is trained, the sample data set is input into the two-stage neural network for iterative training. In each iteration, the computer device may obtain, in combination with the target loss function, a structural reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, the edge line distribution loss value, and the edge line enhancement loss value corresponding to the current iteration, and further obtain a final target loss value, thereby iteratively adjusting the network model parameters according to the target loss value until the iteration is stopped.
It can be understood that the edge line enhancement loss value is determined by the second probability output by the second discriminator after the image with lines generated by the second generator with the image generated by the first generator as input is input to the second discriminator; and the second probability is used for representing the line intensity difference between the image with the lines generated by the second generator and the original cartoon image. That is, in each iteration, the image generated by the first generator may be input to the second generator, the image with lines generated by the second generator according to the input image may be input to the second discriminator, and the second discriminator may output a probability value, that is, a second probability. And determining an edge line enhancement loss value according to the second probability. It can be understood that the greater the second probability is, the closer the line intensity of the image generated by the second generator and the original cartoon image is, the more difficult it is for the second discriminator to recognize, thereby implementing enhancement processing on the contour edge line.
In one embodiment, the edge enhancement counters the loss function (L)eau-D) The formula of (1) is as follows:
Figure BDA0002504108420000261
in one embodiment, the edge line enhancement loss function L corresponding to the generatoreauThe formula of (1) is as follows:
Figure BDA0002504108420000262
wherein A isyAs abstract cartoon sample diagrams, CGTSample drawing A of a representation abstract cartoonyThe corresponding original cartoon image; dLDenotes a second discriminator, DL() Representing the result output by the second discriminator, g (i) and L (g (i)) indicate the image generated by the first generator of the first stage and the image generated by the second generator of the second stage, respectively. DL(L (g (i)) represents a second probability that the image generated by the second generator is input to the second discriminator and then output. DL(CGT) The result output after the original cartoon image is input to the second discriminator is shown; dL(Ay) Representation of abstract cartoon sample graph AyAnd inputting the result to a second discriminator and outputting the result.
In one embodiment, the final objective loss function LallThe formula of (1) is as follows:
Figure BDA0002504108420000263
wherein L issscFor structural reconstruction loss function, LstyEnhancing the loss function for style, LflaSmoothing loss function for bilateral filtering, LeasAssigning a loss function, L, to the edge lineseauα, gamma, theta are parameters, α, gamma, theta may be heuristically set to 10, 2, 1.5, 50, 1, respectively, in one embodiment.
In other words, in the second stage, the edge line distribution loss value is determined in a supervised manner, and the edge line enhancement loss value is determined in an unsupervised manner, so that in the second stage, edge contour lines are drawn on the basis of the cartoon image generated in the first stage through supervised edge line distribution and unsupervised edge line enhancement training, thereby reducing the hastening of unnecessary lines and improving the quality of the final cartoon image.
To facilitate understanding of the two-stage neural network, a schematic description of the network structure of the two-stage neural network will now be provided with reference to fig. 3. The two-stage neural network includes a first stage abstract network and a second stage line drawing network. The structures of the first generator and the second generator are mainly illustrated in fig. 3, and the structures of the first discriminator and the second discriminator are not shown.
Referring to FIG. 3, a first generator G in a first stage abstraction network is provided with an encoder E1And decoder D1A full convolution network of structure. Encoder E1And decoder D1It is possible to stack two convolution blocks in sequence, each containing a convolution layer with a kernel size of 3 × 3, an example normalization layer and a ReLu (linear rectification function) activation layer1(e.g., 9) to supplement useful functions and eliminate artifacts.
Further, to take full advantage of the original input, a single-step translation block (Flat Conv) F may be provided before the encoder E11eTranslation block F1eIt can be set up using 64 convolution kernels of size 7 × 7 to extend the receive domain and reintegrate features to learn more global representations at decoder D1A flat conversion block F can then be provided1dTo decode D1The output characteristics are converted into a three-channel output. Artifacts at the boundary of surrounding images are reduced by Reflection filling Reflection ping. At the end of the first generator G, a skipped residual join from input to output is used, followed by a Tanh (one of the hyperbolic functions) activation layer T to obtain the abstract result that the first generator finally generates, i.e. the generated abstract cartoon image.
The first arbiter in the first stage abstract network consists of three convolutional blocks with a kernel size of 4 x 4. After the normalization layer for each volume block, a leak ReLU (Linear Unit function with leakage correction) is used as the activation function. The final classification result may be obtained by converting the output features into binary values using a convolution kernel. It is to be appreciated that for the first discriminator, it is equivalent to applying a structure at the block level patch-level to focus on local feature learning. The structure of the first discriminator is not shown in the figure.
The second stage lines depict the structural composition of the network, similar to that of the first stage abstract network. The structure of the second discriminator adopts the structure of the first discriminator (the structure of the second discriminator is also not shown in fig. 3), and the second generator L adopts a structure similar to that of the first generator G.
As can be seen from fig. 3, the second generator L also comprises an encoder E2And decoder D2At the decoder D2Front sum encoder E2Thereafter, there are flat blocks F respectively2dAnd F2eEncoder E2And decoder D2Between which there are several residual blocks R2. The second generator L is different from the first generator G in two places. The first difference is that the second generator uses a smaller number of residual blocks than in the first generatorThe number of residual blocks (e.g., 9 residual blocks are used by the first generator, and 6 residual blocks are used by the second generator). The second generator can use a smaller number of residual blocks because the edge line information is simpler than the information contained in the color style and the shallow network is more suitable. A second difference is that the second generator does not employ a jump connection from input to output compared to the first generator. It should be noted that fig. 3 only illustrates the network structure of the second generator L, and it is not to say that the output result g (i) of the first stage is input to F2dThen from F2eOutputting the final result, namely, passing the output result G (I) of the first stage through the encoder in the second generator L, then passing the encoded output result G (I) through the decoder of the second generator L, and finally outputting the output result L (G (I)), namely sequentially passing through F2e—>E2—>R2—>D2—>F2dThereby outputting L (G (I)).
A brief flow of the cartoon-like processing for a real scene will now be described with the network structure shown in fig. 3.
A brief flow in the two-stage network training process will be described. In the training process, the sample data set comprises a first set P of real scene sample graphs, a second set A of abstract cartoon sample graphs and a third set C of original cartoon images corresponding to the abstract cartoon sample graphs. Then, in each iterative training, the first set is input into a first-stage abstract network to be trained, the first generator G carries out image reconstruction processing and bilateral filtering smoothing processing, and an abstract cartoon image G (I) is output. Wherein, in the training phase, I represents a real scene sample graph in the first set P. Therefore, the structural reconstruction loss value of the training can be determined according to the structural difference between the cartoon image G (I) and the real scene sample image I, and the difference between adjacent pixels in the cartoon image G (I) is determined in consideration of the similarity of pixel values and the similarity of spatial positions, so that the bilateral filtering smoothing loss value is obtained.
The first discriminator may randomly select a preset number of abstract cartoon sample images from the second set a as a reference to discriminate the probability (i.e., the first probability) that the first generator G outputs the style of the cartoon image G (i) belonging to the abstract cartoon sample images, thereby calculating the style enhancement loss value. Shown in fig. 3 are 5 abstract cartoon sample views (5 images as shown in dashed box 302).
In each iterative training, one abstract cartoon sample graph (e.g., A shown in FIG. 3) in the second set may be selected for training of the second stage line drawing networky) As input, finding the abstract cartoon sample graph A from the third setyCorresponding original cartoon image CGTFor reference, cartoon sample graph A is drawn from an abstractyAnd its corresponding original cartoon image CGTAnd carrying out supervised training to obtain an edge line distribution loss value.
In addition, in each iterative training, g (i) output in the first stage may be input into the second-stage line drawing network, and a line is drawn for the g (i) output in the first stage by the second generator, so that the cartoon image L (g (i)) with the line is generated. The cartoon image L (G (I)) generated by the second generator is input to a second discriminator, and the second discriminator can use the original cartoon image CGTFor reference, a cartoon image L (G (I)) and an original cartoon image C are outputGTThe probability (i.e., the second probability) of the line strength difference therebetween, thereby obtaining an edge line enhancement loss value.
And then, determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, the edge line distribution loss value and the edge line enhancement loss value in the training. And adjusting parameters of the first generator, the first discriminator, the second generator and the second discriminator according to the target loss value, entering next iterative training, and iterating until the training is stopped to obtain the trained two-stage neural network.
After the two-stage neural network is trained, when the two-stage neural network is used for cartoon processing of a real scene image, only generators in a first-stage abstract network and a second-stage line drawing network are used for processing, and a discriminator is only used in the training process.
The input-output flow during the use of the two-stage neural network will now be briefly described. Inputting a real scene image I (I is understood to be a real scene image to be cartoon-processed when the network is used) into a first generator G, generating an abstract cartoon image, and stylizing the abstract cartoon image by the first generator G to generate a style abstract image with artistic style (G (I) is understood to be an abstract image with artistic style output by the first generator G — the style abstract image) when the network is used, then inputting the style abstract image into a second generator L, and outputting a final cartoon image (L (G (I)) is a final cartoon image output by the second generator when the network is used).
In one embodiment, the iterative training includes structure reconstruction iterative training and network stacking iterative training.
It will be appreciated that the entire two-stage neural network may be iteratively trained in an end-to-end manner, i.e., the first stage abstract network and the second stage line delineation network are iteratively trained simultaneously. In addition, the iterative training can be split into structure reconstruction iterative training and network stack iterative training, so that the training process can be accelerated, and the convergence can be improved.
And the structure reconstruction iterative training is used for training an initialization abstract network for image reconstruction processing. And (4) network stacking iterative training, namely stacking the initialized abstract network and the second-stage line drawing network together for iterative training. Namely, the first-stage abstract network is initially trained through structure reconstruction training to obtain an initialized abstract network. The initialized abstract network only has an image reconstruction function and cannot realize abstract smooth processing, so that the initialized abstract network and the second-stage line drawing network can be laminated together for iterative training, and a first-stage abstract network capable of realizing image reconstruction and abstract smooth processing and a second-stage line drawing network capable of generating lines are generated.
In this embodiment, the sample data set includes a first set of real scene sample diagrams, a second set of abstract cartoon sample diagrams, and a third set of original cartoon images corresponding to the abstract cartoon sample diagrams. Training the deep neural network, comprising: inputting the first set into an abstract network to be trained for structure reconstruction iterative training, determining a structure reconstruction loss value in each round of structure reconstruction iterative training, and adjusting parameters of a first generator according to the structure reconstruction loss value until an initialization training stop condition is reached to obtain an initialization abstract network; inputting the first set and the second set into an initialization abstract network, inputting the second set and the third set into a second-stage line drawing network to be trained for network stacking iterative training, and determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value in each iteration; determining a comprehensive loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value; adjusting parameters of a first generator and a first discriminator of the initialized abstract network and parameters of a second generator and a second discriminator of a second-stage line drawing network to be trained according to the comprehensive loss value until iteration stops to obtain a trained two-stage neural network; the two-stage neural network comprises a first-stage abstract network and a second-stage line drawing network.
Specifically, first, the computer device may pre-train an initialization abstraction network for implementing image reconstruction. That is, the computer device may input the first set into the abstract network to be trained to perform structure reconstruction iterative training, determine a structure reconstruction loss value in each round of structure reconstruction iterative training, and adjust parameters of the first generator according to the structure reconstruction loss value until an initialization training stop condition is reached, so as to obtain an initialized abstract network. In this way, the resulting initialized abstract network can be used for image reconstruction to generate a base abstract image. In one embodiment, the initialized abstract network can be trained in a small number of iterations (e.g., 10 iterations) of the iterative training of the structure reconstruction.
Then, the computer device can input the first set and the second set into the initialized abstract network, input the second set and the third set into a second-stage line drawing network to be trained for network stacking iterative training, determine a comprehensive loss value in each iteration, adjust parameters of a first generator and a first discriminator of the initialized abstract network and parameters of a second generator and a second discriminator of the second-stage line drawing network to be trained according to the comprehensive loss value until iteration stops, and obtain a trained two-stage neural network; the two-stage neural network comprises a first-stage abstract network and a second-stage line drawing network; the comprehensive loss value is obtained by determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value which are determined in each iteration of network stacking iterative training; and the comprehensive loss value is a target loss value determined by each iteration in the network stack iterative training. Equivalently, in each iteration of the network stacking iterative training, a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value are determined, and a target loss value in each iteration is determined according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value, so that model parameters of the initialized abstract network and the line drawing network at the second stage to be trained are adjusted, and the training in the aspects of structure reconstruction, bilateral filtering smoothing, style enhancement and edge line distribution is also performed in the network stacking iterative training.
It can be understood that the edge line distribution loss value is obtained by using the abstract cartoon sample graph in the second set as an input and using the corresponding original cartoon image as an output reference and performing supervised training, so as to train the second-stage line drawing network in a supervised manner.
In one embodiment, the composite loss value may further include an edge line enhancement loss value. It is understood that the edge line enhancement loss value is trained unsupervised to enable the second-stage line tracing network to generate enhanced, sharp contour edge lines. The training of the aspects of structure reconstruction, bilateral filtering smoothing, style enhancement, edge line distribution and edge line enhancement is performed in the network stacking iterative training.
In one embodiment, the method further comprises: in the network stacking iterative training process, when one network in the initialized abstract network and the second-stage line drawing network to be trained reaches a training stopping condition and the other network does not reach the training stopping condition, stopping adjusting network model parameters of the network reaching the training stopping condition, and continuing training the other network not reaching the training stopping condition until the training stopping condition is reached.
For example, about 100 rounds of training may produce a satisfactory first stage abstract network, and training of the first stage abstract network may be stopped. For the second stage line drawing network, 200 rounds of training are required to achieve the effect. Therefore, after stopping the training of the first-stage abstract network, the training of the second-stage line drawing network may be continued until the training stop condition is reached.
In the above embodiment, an initialized abstract network for implementing image reconstruction is trained, and then the initialized abstract network and the second-stage line drawing network are stacked together for iterative training, so that convergence can be performed more quickly, and a first-stage abstract network capable of implementing image reconstruction and abstract smoothing processing and a second-stage line drawing network capable of generating lines are obtained.
It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above-mentioned flowcharts may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or the stages in other steps.
To demonstrate the effects that the method of the present application can produce, reference is now made to fig. 4 to 7.
Referring to fig. 4, 402 in fig. 4 is an input real scene image, and 404-408 are cartoon images of different styles generated by the input real scene image. It can be understood that the deep learning network trained by the sample data sets of different styles (i.e. abstract cartoon sample maps or original cartoon images of different styles) can be used to generate cartoon images of different styles. For example, the deep learning network for generating 404 cartoon images of genre of a artist is training an abstract cartoon sample map or original cartoon images belonging to the genre of a artist as a sample data set. For another example, the deep learning network for generating 408 the cartoon image of the genre of the C artist uses an abstract cartoon sample map or an original cartoon image of the genre of the C artist as a sample data set in training.
Fig. 5 is used to show the results in different states. Referring to fig. 5, it can be seen that (a) is an input real scene image, and (b) to (e) are intermediate results. (f) The final result, i.e., the final cartoonized image. (f) Compared with the cartoon images (b) to (e), the cartoon image has higher quality and more obvious artistic style.
Fig. 6 is a graph showing the effect of the comparison experiment between the present disclosure and the reference set in one embodiment. Fig. 6 shows (a) an input real scene image, and (b) a cartoon effect of CycleGAN (CycleGAN is essentially two mirror-symmetric GANs, which form an annular network and can implement cyclic style conversion); (c) cartoonizing effect of cartoonizing GAN (cartoonizing GAN, a neural network model in CVPR (IEEE international computer vision and pattern recognition conference) 2018 that converts real photos into cartoon images based on a GAN model); (d) cartoonizing effect of CartoonGAN2(CartoonGAN2 is a fine-tuned (refined) cartoonizing result); (e) the cartoon effect of the scheme. And each group of effect pictures are cartoon images with three different styles. The genre of the a artist, the genre of the B artist, and the genre of the C artist are in this order from top to bottom. As can be seen from fig. 6, the cartoon effect (e) of the present embodiment can more accurately retain the significance structure. For example, the significant texture of clouds and stones and character background is better preserved compared to (b) and (d). And the color abstraction is smoother and the lines are sharper compared to (b) and (d), e.g., the lines of the bridges in (b) and (d) are not clear. Therefore, the cartoon effect of the scheme is higher than the cartoon image quality of the (b) and the (d).
The cartoon effect of the scheme also avoids the growth promoting lines in the areas needing to be smooth. For example, compared with the cloud color and the sea wave, the method avoids the generation of messy lines, so that the picture is smoother, and also avoids the problems of color overflow, structural distortion and large artifacts in the step (c). Therefore, the cartoon effect of the scheme is higher than the cartoon image quality of the scheme (c).
Fig. 7 is a comparison diagram of different styles of cartoon images implemented by the scheme of the application. The input real scene graph, the style of the artist A, the style of the artist B and the style of the artist C are sequentially arranged from left to right. It can be understood that the method according to the scheme can generate cartoon images with different artistic styles by adopting training data with different styles.
In one embodiment, as shown in fig. 8, there is provided a processing apparatus for cartoon-making an image of a real scene, where the apparatus may adopt a software module or a hardware module, or a combination of the two modules as a part of a computer device, and the apparatus specifically includes: an obtaining module 802, an abstraction processing module 804, and a line generating module 806, wherein:
an obtaining module 802, configured to obtain a real scene image.
The abstract processing module 804 is configured to perform image reconstruction processing and abstract smoothing processing on the real scene image based on semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on the cartoon domain; the abstract cartoon image has a significant structure in the real scene image and lacks contour edge lines in the real scene image; and carrying out stylization processing on the abstract cartoon image to generate a style cartoon image with artistic style.
And the line generating module 806 is configured to generate contour edge lines of the style cartoon image to obtain a cartoon image after the cartoon image of the real scene is cartoonized.
In one embodiment, the real scene image is extracted from the video frame sequence in sequence; the video frame sequence comprises at least two real scene images which are arranged in sequence;
the device also includes:
the output module 808 is configured to, when the video frame sequence is an image frame sequence in a video file, generate a cartoon image according to the cartoon image after cartoon of each real scene image in the video frame sequence, or, when the video file is played, obtain and output the cartoon image corresponding to each real scene image in the video frame sequence in real time.
In one embodiment, the output module 808 is further configured to output the cartoonized video stream in real time according to the cartoonized images of the real scene images in the real-time video frame sequence when the video frame sequence is an image frame sequence in the real-time video stream.
In an embodiment, the abstract processing module 804 is further configured to input the real scene image into a trained deep neural network, so as to perform semantic extraction on the real scene image in the first stage, perform image reconstruction processing on the real scene image based on the extracted semantic information, and perform bilateral filtering smoothing processing on reconstructed image content, so as to generate an abstract cartoon image.
In one embodiment, the real scene image is input into a first generator of an abstract network in a deep neural network.
As shown in fig. 9, the apparatus further includes: a model training module 801 and an output module 808. Wherein:
a model training module 801, configured to obtain a sample data set; the sample data set comprises a first set of real scene sample graphs; inputting a sample data set into a deep neural network to be trained for iterative training, determining a structural reconstruction loss value and a bilateral filtering smoothing loss value in each iteration, and determining a target loss value according to the structural reconstruction loss value and the bilateral filtering smoothing loss value; in each iteration, adjusting network model parameters according to a target loss value until the iteration is stopped to obtain a trained deep neural network; the network model parameters include parameters of the first generator; the trained deep neural network comprises a trained abstract network; the structure reconstruction loss value is used for representing the difference of the structure characteristics between the cartoon image generated by the first generator according to the real scene sample diagram and the real scene sample diagram; and the bilateral filtering smoothing loss value is used for determining the difference between adjacent pixels in the cartoon image generated by the first generator according to the pixel value similarity and the spatial position similarity.
In one embodiment, the abstract processing module 804 is further configured to stylize, via the first generator, the abstract cartoon image to generate a stylized cartoon image having an artistic style.
In one embodiment, the abstract network further comprises a first arbiter; the network model parameters also comprise parameters of a first discriminator; the sample data set further comprises a second set of abstract cartoon sample diagrams; the model training module 801 is further configured to determine a structural reconstruction loss value, a bilateral filtering smoothing loss value, and a style enhancement loss value in each iteration, and determine a target loss value according to the structural reconstruction loss value, the bilateral filtering smoothing loss value, and the style enhancement loss value; the style enhancement loss value is determined by a first discriminator according to a first probability of cartoon image output generated by a first generator; the first probability is used for representing the probability that the cartoon image generated by the first generator belongs to the style of the abstract cartoon sample graph.
In one embodiment, the model training module 801 is further configured to obtain a third set of original cartoon images; the original cartoon images in the third set have the same artistic style; extracting contour edge lines of each original cartoon image according to a pre-trained line tracking network; generating an abstract cartoon sample picture according to the extracted contour edge lines and the original cartoon image to obtain a second set; the abstract cartoon sample picture is an abstract picture obtained by eliminating contour edge lines from an original cartoon image.
In one embodiment, the deep learning network is a two-stage neural network, and the abstract network is a first-stage abstract network in the two-stage neural network; the two-stage neural network also comprises a second-stage line drawing network; the line generation module 806 is further configured to, in the second stage, input the style cartoon image generated in the first stage into a second generator in a second-stage line tracing network in the trained two-stage neural network, so as to perform contour edge line generation processing on the style cartoon image through the second generator, and obtain a cartoon image after cartoon of the real scene image; the second-stage line drawing network is a deep neural network for generating contour edge lines in the second stage.
In one embodiment, the abstract cartoon sample map is an abstract image with contour edge lines removed from the original cartoon image; the sample data set further comprises a third set of original cartoon images corresponding to the abstract cartoon sample graph; the network model parameters further include parameters of the second generator. The model training module 801 is further configured to determine a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value, and an edge line distribution loss value in each iteration, and determine a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, and the edge line distribution loss value, where the edge line distribution loss value is used to represent a difference between an image with a line generated when the abstract cartoon sample diagram is used as an input of the second generator and an original cartoon image corresponding to the abstract cartoon sample diagram; the network model parameters also include parameters of the second generator.
In one embodiment, the second stage line drawing network further comprises a second arbiter; the network model parameters also comprise parameters of a second discriminator; the model training module 801 is further configured to determine a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value, an edge line distribution loss value, and an edge line enhancement loss value in each iteration; and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, the edge line distribution loss value and the edge line enhancement loss value. The edge line enhancement loss value is determined by a second probability output by a second discriminator after the image with lines generated by the second generator and the image generated by the first generator are input to the second discriminator; and the second probability is used for representing the line intensity difference between the image with the lines generated by the second generator and the original cartoon image.
In one embodiment, the iterative training includes structure reconstruction iterative training and network stacking iterative training; structure reconstruction iterative training, which is used for training an initialized abstract network for image reconstruction processing; and (4) network stacking iterative training, namely stacking the initialized abstract network and the second-stage line drawing network together for iterative training.
In this embodiment, the model training module 801 is further configured to obtain a first set of real scene sample diagrams, a second set of abstract cartoon sample diagrams, and a third set of original cartoon images corresponding to the abstract cartoon sample diagrams; inputting the first set into an abstract network to be trained for structure reconstruction iterative training, determining a structure reconstruction loss value in each round of structure reconstruction iterative training, and adjusting parameters of a first generator according to the structure reconstruction loss value until an initialization training stop condition is reached to obtain an initialization abstract network; inputting the first set and the second set into an initialization abstract network, inputting the second set and the third set into a second-stage line drawing network to be trained for network stacking iterative training, and determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value in each iteration; determining a comprehensive loss value according to the structure reconstruction loss value, the bilateral filtering smooth loss value, the style enhancement loss value and the edge line distribution loss value, adjusting parameters of a first generator and a first discriminator of an initialized abstract network and parameters of a second generator and a second discriminator of a second-stage line drawing network to be trained according to the comprehensive loss value until iteration stops, and obtaining a trained two-stage neural network; the two-stage neural network comprises a first-stage abstract network and a second-stage line drawing network; and the comprehensive loss value is a target loss value determined in each iteration of the network stack iteration training.
For specific limitations of the processing apparatus for cartoon real scene images, reference may be made to the above limitations of the processing method for cartoon real scene images, which are not described herein again. All or part of the modules in the processing device for cartoon real scene images can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to realize a method for processing cartoon of real scene images. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store search data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a method for processing cartoon of real scene images.
It will be appreciated by those skilled in the art that the configurations shown in fig. 10 and 11 are block diagrams of only some of the configurations relevant to the present disclosure, and do not constitute a limitation on the computing devices to which the present disclosure may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A method for processing cartoon of real scene image is characterized by comprising the following steps:
acquiring a real scene image;
based on the semantic information of the real scene image, carrying out image reconstruction processing and abstract smoothing processing on the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain; the abstract cartoon image has a significant structure in the real scene image and is missing a contour edge line in the real scene image;
stylizing the abstract cartoon image to generate a style cartoon image with artistic style;
and generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized.
2. The method according to claim 1, wherein the real scene image is extracted from a sequence of video frames in sequence; the video frame sequence comprises at least two real scene images which are arranged in sequence;
the method further comprises the following steps:
when the sequence of video frames is a sequence of image frames in a video file, then
Generating cartoon video files according to cartoon images cartoon after cartoon of each real scene image in the video frame sequence,
or when the video file is played, the cartoon images corresponding to the real scene images in the video frame sequence are acquired in real time and output.
3. The method of claim 2, further comprising:
when the sequence of video frames is a sequence of image frames in a real-time video stream, then
And outputting cartoon video stream according to the cartoon image after cartoon of the real scene image in the video frame sequence.
4. The method according to any one of claims 1 to 3, wherein the performing image reconstruction processing and abstract smoothing processing on the real scene image based on semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain comprises:
inputting the real scene image into a trained deep neural network to perform semantic extraction on the real scene image in a first stage;
based on the extracted semantic information, carrying out image reconstruction processing on the real scene image;
and carrying out bilateral filtering smoothing treatment on the reconstructed image content to generate an abstract cartoon image.
5. The method of claim 4, wherein the real scene image is input into a first generator of an abstract network in the deep neural network; the training step of the deep neural network comprises the following steps:
acquiring a sample data set; the sample data set comprises a first set of real scene sample graphs;
inputting the sample data set into a deep neural network to be trained for iterative training, determining a structural reconstruction loss value and a bilateral filtering smoothing loss value in each iteration, and determining a target loss value according to the structural reconstruction loss value and the bilateral filtering smoothing loss value;
in each iteration, adjusting network model parameters according to the target loss value until the iteration is stopped to obtain a trained deep neural network; the network model parameters include parameters of the first generator; the trained deep neural network comprises a trained abstract network;
the structure reconstruction loss value is used for representing the difference of the structural characteristics between the cartoon image generated by the first generator according to the real scene sample image and the real scene sample image;
and the bilateral filtering smoothing loss value is used for determining the difference between adjacent pixels in the cartoon image generated by the first generator according to the pixel value similarity and the spatial position similarity.
6. The method of claim 5, wherein said stylizing said abstract cartoon image to produce a stylized cartoon image having an artistic style comprises:
in the first stage, stylizing the abstract cartoon image through the first generator to generate a style cartoon image with artistic style.
7. The method of claim 6, wherein the abstract network further comprises a first arbiter; the network model parameters further comprise parameters of the first discriminator; the sample data set further comprises a second set of abstract cartoon sample diagrams;
determining a structural reconstruction loss value and a bilateral filtering smoothing loss value in each iteration, and determining a target loss value according to the structural reconstruction loss value and the bilateral filtering smoothing loss value comprises:
in each iteration, determining a structural reconstruction loss value, a bilateral filtering smoothing loss value and a grid enhancement loss value, and determining a target loss value according to the structural reconstruction loss value, the bilateral filtering smoothing loss value and the grid enhancement loss value;
wherein the style enhancement loss value is determined by the first discriminator based on a first probability of the cartoon image output generated by the first generator; the first probability is used for representing the probability that the cartoon image generated by the first generator belongs to the style of the abstract cartoon sample graph.
8. The method of claim 7, wherein the obtaining of the second set comprises:
acquiring a third set of original cartoon images; the original cartoon images in the third set have the same artistic style;
extracting contour edge lines of the original cartoon images according to a pre-trained line tracking network;
generating an abstract cartoon sample picture according to the extracted contour edge lines and the original cartoon image to obtain a second set; the abstract cartoon sample picture is an abstract picture obtained by eliminating contour edge lines from the original cartoon picture.
9. The method of claim 7, wherein the deep learning network is a two-stage neural network, and the abstraction network is a first-stage abstraction network of the two-stage neural network; the two-stage neural network also comprises a second-stage line drawing network; the generating of the contour edge line of the style cartoon image to obtain the cartoon image comprises:
in the second stage, inputting the style cartoon image generated in the first stage into a second generator in a second-stage line drawing network in the trained two-stage neural network, and performing contour edge line generation processing on the style cartoon image through the second generator to obtain a cartoon image after cartoon of the real scene image is cartoonized;
wherein the second-stage line tracing network is a deep neural network for generating contour edge lines in the second stage.
10. The method of claim 9, wherein the abstract cartoon sample map is an abstract image with contour edge lines removed from an original cartoon image; the sample data set further comprises a third set of the original cartoon images corresponding to the abstract cartoon sample map; the network model parameters further include parameters of the second generator;
in each iteration, determining a structure reconstruction loss value, a bilateral filtering smoothing loss value and a style enhancement loss value, and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value and the style enhancement loss value comprises:
determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value in each iteration, and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value;
and the edge line distribution loss value is used for representing the difference between the image with the line, which is generated when the abstract cartoon sample diagram is used as the input of the second generator, and the original cartoon image corresponding to the abstract cartoon sample diagram.
11. The method of claim 10, wherein the second stage line drawing network further comprises a second arbiter; the network model parameters further comprise parameters of the second discriminator;
in each iteration, determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value, and determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value, including:
in each iteration, determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value, an edge line distribution loss value and an edge line enhancement loss value;
determining a target loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value, the edge line distribution loss value and the edge line enhancement loss value;
the edge line enhancement loss value is determined by a second probability output by the second discriminator after the image with the line generated by the second generator with the image generated by the first generator as input is input to the second discriminator; and the second probability is used for representing the line intensity difference between the image with the lines generated by the second generator and the original cartoon image.
12. The method of claim 4, wherein the deep neural network training step comprises:
acquiring a first set of real scene sample images, a second set of abstract cartoon sample images and a third set of original cartoon images corresponding to the abstract cartoon sample images;
inputting the first set into an abstract network to be trained for structure reconstruction iterative training, determining a structure reconstruction loss value in each round of structure reconstruction iterative training, and adjusting parameters of the first generator according to the structure reconstruction loss value until an initialization training stop condition is reached to obtain an initialization abstract network;
inputting the first set and the second set into the initialized abstract network, inputting the second set and the third set into a second-stage line drawing network to be trained for network stacking iterative training, and determining a structure reconstruction loss value, a bilateral filtering smoothing loss value, a style enhancement loss value and an edge line distribution loss value in each iteration;
determining a comprehensive loss value according to the structure reconstruction loss value, the bilateral filtering smoothing loss value, the style enhancement loss value and the edge line distribution loss value;
adjusting parameters of the first generator and the first discriminator of the initialized abstract network and parameters of the second generator and the second discriminator of the second-stage line drawing network to be trained according to the comprehensive loss value until iteration stops to obtain a trained two-stage neural network; the two-stage neural network comprises a first-stage abstract network and a second-stage line drawing network.
13. A device for processing cartoon of real scene image, the device comprising:
the acquisition module is used for acquiring a real scene image;
the abstract processing module is used for carrying out image reconstruction processing and abstract smoothing processing on the real scene image based on the semantic information of the real scene image to obtain an abstract cartoon image of the real scene image mapped on a cartoon domain; the abstract cartoon image has a significant structure in the real scene image and is missing a contour edge line in the real scene image; stylizing the abstract cartoon image to generate a style cartoon image with artistic style;
and the line generating module is used for generating contour edge lines of the style cartoon image to obtain the cartoon image after the cartoon image of the real scene is cartoon-ized.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.
CN202010440936.1A 2020-05-22 2020-05-22 Method and device for processing cartoon of real scene image, computer equipment and storage medium Pending CN111696028A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010440936.1A CN111696028A (en) 2020-05-22 2020-05-22 Method and device for processing cartoon of real scene image, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010440936.1A CN111696028A (en) 2020-05-22 2020-05-22 Method and device for processing cartoon of real scene image, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111696028A true CN111696028A (en) 2020-09-22

Family

ID=72476806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010440936.1A Pending CN111696028A (en) 2020-05-22 2020-05-22 Method and device for processing cartoon of real scene image, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111696028A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132922A (en) * 2020-09-24 2020-12-25 扬州大学 Method for realizing cartoon of images and videos in online classroom
CN112508991A (en) * 2020-11-23 2021-03-16 电子科技大学 Panda photo cartoon method with separated foreground and background
CN112529978A (en) * 2020-12-07 2021-03-19 四川大学 Man-machine interactive abstract picture generation method
CN112991358A (en) * 2020-09-30 2021-06-18 北京字节跳动网络技术有限公司 Method for generating style image, method, device, equipment and medium for training model
CN113034523A (en) * 2021-04-23 2021-06-25 腾讯科技(深圳)有限公司 Image processing method, image processing device, storage medium and computer equipment
CN113052759A (en) * 2021-03-31 2021-06-29 华南理工大学 Scene complex text image editing method based on MASK and automatic encoder
CN113313625A (en) * 2021-05-13 2021-08-27 华南理工大学 Ink and wash painting artistic style conversion method, system, computer equipment and storage medium
CN113409342A (en) * 2021-05-12 2021-09-17 北京达佳互联信息技术有限公司 Training method and device for image style migration model and electronic equipment
CN113507573A (en) * 2021-08-13 2021-10-15 维沃移动通信(杭州)有限公司 Video generation method, video generation device, electronic device and readable storage medium
CN113838159A (en) * 2021-09-14 2021-12-24 上海任意门科技有限公司 Method, computing device and storage medium for generating cartoon image
CN114025198A (en) * 2021-11-08 2022-02-08 深圳万兴软件有限公司 Video cartoon method, device, equipment and medium based on attention mechanism
CN114385883A (en) * 2021-12-07 2022-04-22 西北大学 Contour enhancement method for approximately simulating wrinkle method in style conversion
CN114743080A (en) * 2022-03-04 2022-07-12 商汤国际私人有限公司 Image processing method and device, terminal and storage medium
WO2022170982A1 (en) * 2021-02-09 2022-08-18 北京字跳网络技术有限公司 Image processing method and apparatus, image generation method and apparatus, device, and medium
CN115908962A (en) * 2022-06-13 2023-04-04 北京融合未来技术有限公司 Neural network training method, pulse signal reconstruction image generation method and device
CN116012258A (en) * 2023-02-14 2023-04-25 山东大学 Image harmony method based on cyclic generation countermeasure network
CN113034523B (en) * 2021-04-23 2024-11-15 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and computer equipment

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132922B (en) * 2020-09-24 2024-10-15 扬州大学 Method for cartoon image and video in online class
CN112132922A (en) * 2020-09-24 2020-12-25 扬州大学 Method for realizing cartoon of images and videos in online classroom
CN112991358A (en) * 2020-09-30 2021-06-18 北京字节跳动网络技术有限公司 Method for generating style image, method, device, equipment and medium for training model
CN112508991A (en) * 2020-11-23 2021-03-16 电子科技大学 Panda photo cartoon method with separated foreground and background
CN112508991B (en) * 2020-11-23 2022-05-10 电子科技大学 Panda photo cartoon method with separated foreground and background
CN112529978A (en) * 2020-12-07 2021-03-19 四川大学 Man-machine interactive abstract picture generation method
WO2022170982A1 (en) * 2021-02-09 2022-08-18 北京字跳网络技术有限公司 Image processing method and apparatus, image generation method and apparatus, device, and medium
CN113052759A (en) * 2021-03-31 2021-06-29 华南理工大学 Scene complex text image editing method based on MASK and automatic encoder
CN113034523B (en) * 2021-04-23 2024-11-15 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and computer equipment
CN113034523A (en) * 2021-04-23 2021-06-25 腾讯科技(深圳)有限公司 Image processing method, image processing device, storage medium and computer equipment
CN113409342A (en) * 2021-05-12 2021-09-17 北京达佳互联信息技术有限公司 Training method and device for image style migration model and electronic equipment
CN113313625A (en) * 2021-05-13 2021-08-27 华南理工大学 Ink and wash painting artistic style conversion method, system, computer equipment and storage medium
CN113507573A (en) * 2021-08-13 2021-10-15 维沃移动通信(杭州)有限公司 Video generation method, video generation device, electronic device and readable storage medium
CN113838159A (en) * 2021-09-14 2021-12-24 上海任意门科技有限公司 Method, computing device and storage medium for generating cartoon image
CN113838159B (en) * 2021-09-14 2023-08-04 上海任意门科技有限公司 Method, computing device and storage medium for generating cartoon images
CN114025198A (en) * 2021-11-08 2022-02-08 深圳万兴软件有限公司 Video cartoon method, device, equipment and medium based on attention mechanism
CN114385883B (en) * 2021-12-07 2024-03-15 西北大学 Contour enhancement method for approximately simulating chapping method in style conversion
CN114385883A (en) * 2021-12-07 2022-04-22 西北大学 Contour enhancement method for approximately simulating wrinkle method in style conversion
CN114743080A (en) * 2022-03-04 2022-07-12 商汤国际私人有限公司 Image processing method and device, terminal and storage medium
CN115908962A (en) * 2022-06-13 2023-04-04 北京融合未来技术有限公司 Neural network training method, pulse signal reconstruction image generation method and device
CN115908962B (en) * 2022-06-13 2023-11-14 北京融合未来技术有限公司 Training method of neural network, pulse signal reconstruction image generation method and device
CN116012258A (en) * 2023-02-14 2023-04-25 山东大学 Image harmony method based on cyclic generation countermeasure network
CN116012258B (en) * 2023-02-14 2023-10-13 山东大学 Image harmony method based on cyclic generation countermeasure network

Similar Documents

Publication Publication Date Title
CN111696028A (en) Method and device for processing cartoon of real scene image, computer equipment and storage medium
Liu et al. Shadow removal by a lightness-guided network with training on unpaired data
CN111489287B (en) Image conversion method, device, computer equipment and storage medium
Dolhansky et al. Eye in-painting with exemplar generative adversarial networks
CN111243093B (en) Three-dimensional face grid generation method, device, equipment and storage medium
Liu et al. Unsupervised sketch to photo synthesis
Tripathy et al. Facegan: Facial attribute controllable reenactment gan
CN111445410A (en) Texture enhancement method, device and equipment based on texture image and storage medium
CN109255357B (en) RGBD image collaborative saliency detection method
CN111489405B (en) Face sketch synthesis system for generating confrontation network based on condition enhancement
CN110796593A (en) Image processing method, device, medium and electronic equipment based on artificial intelligence
CN111275784A (en) Method and device for generating image
Galteri et al. Deep 3d morphable model refinement via progressive growing of conditional generative adversarial networks
CN114120389A (en) Network training and video frame processing method, device, equipment and storage medium
CN111640172A (en) Attitude migration method based on generation of countermeasure network
Cong et al. Multi-Projection Fusion and Refinement Network for Salient Object Detection in 360$^{\circ} $ Omnidirectional Image
Igorevich Road images augmentation with synthetic traffic signs using neural networks
CN117994480A (en) Lightweight hand reconstruction and driving method
RU2713695C1 (en) Textured neural avatars
Li et al. SPN2D-GAN: semantic prior based night-to-day image-to-image translation
CN116958451B (en) Model processing, image generating method, image generating device, computer device and storage medium
Liao et al. Self-supervised random mask attention GAN in tackling pose-invariant face recognition
Khan et al. Towards monocular neural facial depth estimation: Past, present, and future
Stahl et al. Ist-style transfer with instance segmentation
Thakur et al. White-box cartoonization using an extended gan framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination