[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2020125505A1 - Image processing system - Google Patents

Image processing system Download PDF

Info

Publication number
WO2020125505A1
WO2020125505A1 PCT/CN2019/124417 CN2019124417W WO2020125505A1 WO 2020125505 A1 WO2020125505 A1 WO 2020125505A1 CN 2019124417 W CN2019124417 W CN 2019124417W WO 2020125505 A1 WO2020125505 A1 WO 2020125505A1
Authority
WO
WIPO (PCT)
Prior art keywords
style
image
images
transfer model
model
Prior art date
Application number
PCT/CN2019/124417
Other languages
French (fr)
Inventor
Song-chun ZHU
Original Assignee
Land And Fields Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Land And Fields Limited filed Critical Land And Fields Limited
Publication of WO2020125505A1 publication Critical patent/WO2020125505A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text

Definitions

  • the present disclosure relates generally to image processing, and more specifically to systems and methods for transferring an artistic style to an image.
  • An artistic style transfer model can receive a content image and adapt the content image to a desirable artistic style while preserving the original content.
  • One challenge in generating a style transfer model is achieving a satisfactory balance among speed (e.g., the time it takes a model to transfer a style to a content image) , flexibility (e.g., the number of styles a model can potentially transfer) , and quality (e.g., preserving the content and adapting the style) .
  • vanilla optimization-based algorithm can produce impressive results for arbitrary styles, but is relatively slow due to its iterative nature.
  • the fast approximation methods based on feed-forward neural networks can generate satisfactory artistic effects but bound to only a limited number of styles.
  • Feature-matching methods can achieve arbitrary style transfer in a real-time manner but at the cost of compromised quality.
  • the present invention is directed to generation of artistic style transfer models that achieves a balance among speed, flexibility, and quality.
  • the generation of an artistic style transfer model includes two steps.
  • a system generates a neutral-style transfer model (e.g., a neural network) based on a plurality of artistic styles.
  • the neutral-style transfer model is a model that can be quickly trained further to result in a transfer model of any given artistic style (e.g., pop art style, expressionist style) .
  • Step 2 the system trains the neutral-style transfer model based on a target style to obtain a target-style transfer model.
  • an exemplary computer-enabled method for generating an artistic style transfer model comprises: training an initial model based on a plurality of style images to obtain a neutral-style transfer model; receiving a first style image in a first style; based on the first style image, training a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receiving a second style image in a second style; and based on the second style image, training a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.
  • an exemplary non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to: train an initial model based on a plurality of style images to obtain a neutral-style transfer model; receive a first style image in a first style; based on the first style image, train a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receive a second style image in a second style; and based on the second style image, train a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.
  • an exemplary system comprises: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: training an initial model based on a plurality of style images to obtain a neutral-style transfer model; receiving a first style image in a first style; based on the first style image, training a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receiving a second style image in a second style; and based on the second style image, training a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.
  • an exemplary computer-enabled method for generating an artistic style transfer model comprises: updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein training the initial model comprises a bi-level optimization process; and based on a style image, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
  • an exemplary non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to: update an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein training the initial model comprises a bi-level optimization process; and based on a style image, update an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
  • an exemplary system comprises: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein training the initial model comprises a bi-level optimization process; and updating, based on a style image, training an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
  • FIG. 1A illustrates an exemplary process for generating a target-style transfer model configured to adapt a given content image to the target artistic style, according to some embodiments.
  • FIG. 1B illustrates exemplary style transfer models, according to some embodiments.
  • FIG. 2A illustrates an exemplary process for generating a neutral-style transfer model, according to some embodiments.
  • FIG. 2B illustrates an exemplary process for generating a neutral-style transfer model, according to some embodiments.
  • FIG. 2C illustrates an exemplary process for generating a neutral-style transfer model, according to some embodiments.
  • FIG. 3A illustrates a bi-level optimization problem, according to some embodiments.
  • FIG. 3B illustrates an exemplary set of pseudo code implementing a method for obtaining a neutral-style transfer model, according to some embodiments.
  • FIG. 4 depicts illustrates an exemplary process for generating a target-style transfer model based on a neutral-style transfer model, according to some embodiments.
  • FIG. 5 illustrates exemplary neural network architecture, according to some embodiments.
  • FIG. 6 illustrates an exemplary process for generating a target-style transfer model, according to some embodiments.
  • FIG. 7 depicts an exemplary electronic device, according to some embodiments.
  • the present invention is directed to generation of artistic style transfer models that achieves a balance among speed, flexibility, and quality.
  • the generation of an artistic style transfer model includes two steps.
  • Step 1 a system generates a neutral-style transfer model (e.g., a trained neural network) based on a plurality of artistic styles.
  • the neutral-style transfer model is a model that can be quickly trained further to result in a transfer model of any given artistic style (e.g., pop art style, expressionist style) .
  • Step 1 is implemented by solving a bi-level optimization problem.
  • Step 2 the system trains the neutral-style transfer model based on a target style (e.g., a style image having the target style) to obtain a target-style transfer model.
  • the neutral-style model can be trained based on any arbitrary artistic style (thus achieving flexibility) using only a few post-processing update steps (thus achieving speed) while maintaining high style transfer quality (thus achieving quality) .
  • the adaptation of a neutral-style model to a target-style model can take between approximately 5 -30 seconds.
  • the target-style transfer model can receive a content image from a user and adapt the content image to the target style while preserving the original content.
  • the target-style transfer model can be reused to adapt any number of content images (e.g., multiple images of a video) to the target style.
  • the time the target-style model takes to transfer the target style to a content image is relatively short. In some embodiments, it can take approximately 0.004 second to adapt a content image of size 256 *256 to the target style and approximately 0.01 second to adapt a content image of size 512 *512 to the target style.
  • Step 1 and/or Step 2 can be implemented on one or more mobile phones, one or more computers, one or more remote devices, or a combination thereof.
  • the neutral-style transfer model can be stored on a device and accessible via a web app or a mobile app such that a user can submit any content image and any style image and receive an adapted image in real-time.
  • the adapted image can be provided to the user in a variety of ways, such as displayed on an electronic display, downloaded as a file, sent to and printed on a printer (e.g., paper printer, fabric printer, 3D printer, plastic printer) , added to a different file (e.g., a video animation file) , etc.
  • a printer e.g., paper printer, fabric printer, 3D printer, plastic printer
  • first, ” “second, ” etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another.
  • a first style could be termed a second style, and, similarly, a second style could be termed a first style, without departing from the scope of the various described embodiments.
  • the first style and the second style are both styles, but they are not the same style.
  • if is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting, ” depending on the context.
  • phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event] ” or “in response to detecting [the stated condition or event] , ” depending on the context.
  • FIG. 1A illustrates an exemplary process 100 for generating a target-style transfer model configured to adapt content images to the target artistic style, according to some embodiments.
  • Process 100 is performed, for example, using one or more electronic devices.
  • process 100 is performed using a client-server system, and the blocks of process 100 are divided up in any manner between the server and client device (s) .
  • process 100 is not so limited.
  • some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally, omitted.
  • additional steps may be performed in combination with the process 100. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
  • Step 1 a system (e.g., one or more electronic devices) generates a neutral-style transfer model 106 based on a plurality of artistic styles 102.
  • the plurality of styles 102 can include a plurality of images having the plurality of styles, respectively.
  • the neutral-style transfer model 106 is a model that can be quickly trained further to result in a transfer model of any given artistic style (e.g., pop art style, expressionist style) .
  • Step 2 the system trains the neutral-style transfer model 106 based on a target style 108 to obtain a target-style transfer model 112.
  • the target-style transfer model 112 can receive a content image from a user and adapt the content image to the target style while preserving the original content.
  • an expressionist transfer model can receive a content image (e.g., an image depicting a face) and adapt the content image to the expressionist style while preserving the original content (e.g., the face) .
  • the adapted image can be provided to the user in a variety of ways, such as displayed on an electronic display, downloaded as a file, sent to and printed on a printer (e.g., paper printer, fabric printer, 3D printer, plastic printer) , added to a different file (e.g., a video animation file) , etc.
  • a printer e.g., paper printer, fabric printer, 3D printer, plastic printer
  • a different file e.g., a video animation file
  • the neutral-style transfer model 106 is stored on an electronic device as a part of a mobile app or a web app.
  • a user can provide a target style image and a content image.
  • the system can train the neutral-style transfer model 106 to obtain a target-style transfer model based on the target style image. As discussed above, this can take between approximately 5-30 seconds in some embodiments.
  • the target-style transfer model can be reused to adapt any number of content images to the target style. As discussed above, the time to adapt a content image to the target style can be well below 1 second in some embodiments.
  • Step 2 can be completed between 5 seconds to 30 seconds. Further, the resulting model can produce a high-quality artistic style transfer. Thus, process 100 achieves a desirable balance among speed, flexibility, and quality.
  • FIG. 1B illustrates exemplary models generated as a result of the process 100, according to some embodiments.
  • Post Step 1 the system obtains a neutral-style transfer model.
  • FIG. 1B shows an exemplary neutral-style transfer model 122, which can receive a content image 130 and produce an image 132 adapted to the neutral style.
  • the neutral-style transfer model 122 is configured to be trained further in Step 2 to produce a final, target-style transfer model that can receive content images and adapt the content images to the target style.
  • FIG. 1B further depicts exemplary target-style transfer models.
  • a target-style transfer model 124 results from training the neutral-style transfer model 122 based on Style A (e.g., a style image in Style A) in Step 2.
  • the model 124 is configured to receive a content image 130 and adapt the content image to Style A.
  • a target-style transfer model 126 results from training the neutral-style transfer model 122 based on Style B (e.g., a style image in Style B) in Step 2.
  • the model 126 is configured to receive a content image 130 and adapt the content image to Style B.
  • the system trains a model based on a plurality of styles 102.
  • the model can be any machine learning model.
  • the model is a neutral network.
  • the parameters of the neural work can be randomly initialized.
  • the plurality of styles 102 includes a plurality of image sets corresponding to the plurality of styles. Each image set includes one or more style images in the corresponding artistic style. As shown, Step 1 results in the neutral-style transfer model 106.
  • FIG. 2A illustrates an exemplary process implementing Step 1, according to some embodiments.
  • the system selects a batch of styles from the plurality of styles.
  • the batch of styles comprises style images randomly selected from the plurality of styles 102.
  • the system calculates a loss (also referred to as “outer loss” or “aggregated outer loss” ) corresponding to the selected batch of styles, thus obtaining the loss 210.
  • the loss is a perceptual loss.
  • the loss is indicative of the distances between the images generated by the model and the selected batch of styles (i.e., style images corresponding to the batch of styles) .
  • the loss 210 is obtained by training the model based on each style of the batch of styles, calculating a loss corresponding to each style, and aggregating (and/or averaging) the losses.
  • An exemplary process of block 204 is described in detail with reference to FIG. 2B.
  • the system updates the model based on the loss 210.
  • the system updates the model (e.g., updating the parameters of the neural network) to minimize the loss 210.
  • the updating of the model can be represented algorithmically by the pseudo code below, in which ⁇ represents parameters in the model, E represents the loss 210, and ⁇ represents the outer learning rate.
  • Blocks 202-212 can be repeated until a condition is met.
  • the condition is a predefined number of update iterations such that a loop terminates when this number of iterations is obtained.
  • the system continues to obtain a new batch of styles, obtain a loss corresponding to the new batch of styles, and update the model accordingly.
  • This repeated process can be referred to as the “outer loop” of the process 120 (i.e., Step 1) .
  • the loss 210 can be referred to as the “outer loss” or “aggregated outer loss” .
  • a neutral-style transfer model 106 is obtained.
  • FIG. 2B illustrates an exemplary process of block 204 for obtaining an outer loss corresponding to a batch of styles, according to some embodiments.
  • the system trains the model based on a particular style of the batch of styles (e.g., a style image in the particular style) .
  • the system calculate a loss (or “outer loss” ) corresponding to the particular style.
  • the outer loss corresponding to the particular style can be a perceptual loss.
  • the system samples a batch of content images from a content validation dataset and adapts the batch of content images using the model.
  • the outer loss can be determined based on the distance between each adapted image and the particular style (i.e., style image) and then aggregating (and/or averaging) the distances for all adapted images. In some embodiments, the distance is calculated using a perceptual loss formula, as discussed with reference to FIG. 3A.
  • the system updates an aggregated loss (or aggregated outer loss) corresponding to the batch of styles based on the loss corresponding to the particular style.
  • the aggregated outer loss is incremented by the loss corresponding to the particular style.
  • blocks 226 and 228 can be represented algorithmically by the pseudo code below, in which Dval represents a content validation dataset, Is represents the particular style, and E represents the aggregated outer loss.
  • blocks 224-228 are performed for each style of the patch of styles.
  • the model is trained by a particular style and a loss (or outer loss) corresponding to the particular style is calculated.
  • the loss (or outer loss) corresponding to the batch of styles 210 is obtained.
  • the loss 210 is the sum of all outer losses corresponding to all styles in the batch of styles.
  • the loss 210 is the average of all outer losses corresponding to all styles in the batch of styles.
  • FIG. 2C illustrates an exemplary process of block 224 for training a model based on a single style, according to some embodiments.
  • the system samples a batch of content images.
  • the batch of content images is sampled from a content training dataset, which is different from the content validation dataset.
  • the system calculates a loss corresponding to the batch of content (or “inner loss” ) based on the batch of content and the particular style.
  • the system can adapt the batch of content images using the model.
  • the inner loss can be determined by calculating the distance between each adapted image and the particular style (i.e., the particular style image) and aggregating (or averaging) the distances for all adapted images.
  • the distance is calculated using a perceptual loss formula, as discussed with reference to FIG. 3A.
  • the system updates the model.
  • the system updates the model (e.g., updating the parameters of the neural network) to minimize the inner loss.
  • the updating of the model can be represented algorithmically by the pseudo code below, in which w represents parameters in the model, L represents the inner loss, and ⁇ represents the inner learning rate.
  • Blocks 232-236 can be repeated until a condition is met.
  • a new batch of content is sampled and the model is updated based on a loss (or inner loss) corresponding to the batch of content.
  • This repeated process is referred to as the “inner loop” of Step 1.
  • this repeated process can be represented algorithmically by the pseudo code below, in which ⁇ represents parameters in the model, T represents the number of inner updates, and D tr represents the content training dataset. In some embodiments, T is in the range between 1 and 5.
  • FIG. 3B illustrates an exemplary set of pseudo code implementing Step 1 to obtain a neutral-style transfer model, according to some embodiments.
  • the output of the algorithm includes trained parameters ⁇ of the model.
  • the algorithm trains the model to solve a bi-level optimization problem shown in FIG. 3A (also reproduced below) :
  • Equation 2 corresponds to training the parameters of the model M in the inner loop of the process.
  • represents the initialized parameters of the model M.
  • w represents the trained parameters of the model M, now denoted W s, t to indicate that w is trained based on a particular style.
  • the inner loop of the process trains the model such that the model is optimized (e.g., loss is minimized) with respect to individual styles.
  • Equation 1 corresponds to training the parameters of the model M (e.g., a neural network) in the outer loop.
  • Equation 1 indicates that w s, T , which are trained parameters from the inner loop (Equation 2) , are the parameters of the model M in the outer loop.
  • the outer loop of the process trains the model such that the model is optimized (e.g., loss is minimized) with respect to batches of styles.
  • M (x; y) represents the output of the model M.
  • the input x of the model M is a content image (I c ) .
  • y represents a set of parameters of the model M.
  • the output of M is an adapted image (I x ) that preserves the content of I c in a desirable style I s .
  • l represents a perceptual loss indicative of the compatibility of I x and (I c , I s ) and can be calculated as a sum of content difference between the content image and the solution and the style difference between the style image and the solution, denoted as
  • l can be used to calculate a loss (inner or outer) between an adapted image of a content image and the image pair (i.e., the original content image and a target style image) .
  • the calculated losses can be used to update the model in the inner loop and the outer loop.
  • Both the inner objective and the outer objective are designed to be the perceptual loss averaged across datasets.
  • the inner objective only optimizes contents in the training set, whereas the outer objective generalizes to contents in the validation set.
  • the expectation of the outer objective E c, s is taken with respect to both the styles and the content images in the validation set, whereas the expectation of the inner objective E c is taken with respect to the content images in the training set only.
  • the inner loop involves sampling a content batch from the content training dataset (D tr )
  • the outer loop involves additionally sampling a content batch from the content validation dataset (D val ) .
  • the explicit training-validation separation in the framework forces the style transfer model to generalize to unobserved content images without over-fitting to the training set. Coupled with this separation, the system constrains the number of steps in the gradient dynamics computation to encourage quick adaptation for an arbitrary style and, at the same time, picks an image transformation network due to its efficiency and high transfer quality. These characters serve to the trade-offs among speed, flexibility, and quality.
  • FIG. 4 illustrates illustrates an exemplary process implementing Step 2, according to some embodiments.
  • the system selects a batch of content images.
  • the system calculates a loss based on the neutral-style transfer model and the target style 108. For example, the system first adapts the batch of content images using the neutral-style adaption model 106 to obtain a batch of adapted images. The system can then calculate the distance (e.g., perceptual loss) between an adapted image and the image pair (original content image, target style image 108) . The distances can be aggregated and/or averaged to obtain the loss.
  • the distance e.g., perceptual loss
  • the system updates the model based on the loss.
  • the system updates the model to minimize the loss.
  • This process can be represented by the equation below. Note that the model is initialized with the trained parameters from Step 1 (i.e., the neutral-style transfer model) .
  • the blocks 402-406 can be repeated until a condition is met.
  • the condition is a predefined number (e.g., 100) .
  • a target-style transfer model 112 is obtained.
  • the model 112 can transfer the target style to any content image with high style-transfer quality in real time.
  • FIG. 5 depicts an exemplary neural network architecture, according to some embodiments.
  • the network architecture is an image transformation network.
  • the output of the last convolution layer is unnormalized and activated using the Sigmoid function to squash it into [0, 1] .
  • Upsampled convolution which first upsamples the input and then performs convolution, and reflection padding are used to avoid checkerboard effects.
  • an instance normalization layer is appended after each convolution layer, except the last. This design forces the parameters in instance normalization layers to learn from an implicit, unobserved neutral style while keeping the model size parsimonious.
  • small-batch learning is used to approximate both the inner and outer objective.
  • the inner objective is approximated by several batches sampled from the training dataset and computed on a single style
  • the outer objective is approximated by a style batch, in which each style incurs a perceptual loss computed over a content batch sampled from the validation dataset.
  • FIG. 6 illustrates an exemplary process 600 for generating a target-style transfer model configured to adapt a given content image to the target artistic style, according to some embodiments.
  • Process 600 is performed, for example, using one or more electronic devices.
  • process 600 is performed using a client-server system, and the blocks of process 600 are divided up in any manner between the server and client device (s) .
  • portions of process 600 are described herein as being performed by an electronic device, it will be appreciated that process 600 is not so limited.
  • process 600 some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally, omitted.
  • additional steps may be performed in combination with the process 600. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
  • a system e.g., one or more electronic devices updates an initial model based on a plurality of style images to obtain a neutral-style transfer model.
  • the system receives a first style image in a first style.
  • the system based on the first style image, updates a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and output, via a display, an adapted image in the first style.
  • the system receives a second style image in a second style.
  • the system based on the second style image, updates a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and output, via the display, an adapted image in the second style.
  • the initial model is a neural network.
  • updating the first instance of the neutral-style transfer model to generate the first style transfer model comprises: updating the initial model based on a first batch of style images, wherein the first batch of style images is sampled from the plurality of style images; and after updating the initial model based on the first batch of style images, updating the initial model based on a second batch of style images, wherein the second batch of style images is sampled from the plurality of style images.
  • updating the initial model based on the first batch of style images comprises: obtaining an outer loss corresponding to the first batch of style images; and updating the initial model based on the outer loss.
  • updating the initial model based on the outer loss comprises updating one or more parameters of the initial model according to:
  • represents the one or more parameters of the initial model
  • E is based on the outer loss corresponding to the first batch of style images
  • represents an outer learning rate
  • obtaining the outer loss corresponding to the first batch of style images comprises: performing a first training, wherein the first training comprises updating the initial model based on a first style image of the first batch of style images; after the first training, calculating a first outer loss corresponding to the first style image of the first batch of style images; performing a second training, wherein the second training comprises updating the initial model based on a second style image of the first batch of style images; after the second training, calculating a second outer loss corresponding to the second style image of the first batch of style images; and calculating the outer loss corresponding to the first batch of style images based on the first outer loss and the second outer loss.
  • the method further comprises aggregating the first outer loss and the second outer loss.
  • the method further comprises: averaging the first outer loss and the second outer loss.
  • calculating the first outer loss comprises: sampling a content image; after the first training, obtaining an adapted image corresponding to the sampled content image based on the initial model; and calculating a perceptual loss based on the sampled content image and the adapted image.
  • the content image is sampled from a validation set of content images.
  • the perceptual loss is calculated based on a content loss between the sampled content image and the adapted image and a style loss between the sampled content image and the adapted image.
  • performing the first training comprises: sampling a first batch of content images; calculating a first inner loss corresponding to the first batch of content images; updating the initial model based on the first inner loss.
  • performing the first training further comprises: sampling a second batch of content images; calculating a second inner loss corresponding to the second batch of content images; and updating the initial model based on the second inner loss.
  • the first batch of content images and the second batch of content images are sampled from a training set of content images.
  • updating the initial model based on the first inner loss comprises updating one or more parameters of the initial model according to:
  • w represents the one or more parameters
  • L is based on the first inner loss
  • represents an inner learning rate
  • an exemplary computer-enabled method for generating an artistic style transfer model comprises updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein updating the initial model comprises a bi-level optimization process; and based on a style image, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
  • FIG. 7 illustrates an example of a computing device in accordance with one embodiment.
  • Device 700 can be a host computer connected to a network.
  • Device 700 can be a client computer or a server.
  • device 700 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server or handheld computing device (portable electronic device) such as a phone or tablet.
  • the device can include, for example, one or more of processor 710, input device 720, output device 730, storage 740, and communication device 760.
  • Input device 720 and output device 730 can generally correspond to those described above, and can either be connectable or integrated with the computer.
  • Input device 720 can be any suitable device that provides input, such as a touch screen, keyboard or keypad, mouse, or voice-recognition device.
  • Output device 730 can be any suitable device that provides output, such as a touch screen, haptics device, or speaker.
  • Storage 740 can be any suitable device that provides storage, such as an electrical, magnetic or optical memory including a RAM, cache, hard drive, or removable storage disk.
  • Communication device 760 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or device.
  • the components of the computer can be connected in any suitable manner, such as via a physical bus or wirelessly.
  • Software 750 which can be stored in storage 740 and executed by processor 710, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the devices as described above) .
  • Software 750 can also be stored and/or transported within any non-transitory computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions.
  • a computer-readable storage medium can be any medium, such as storage 740, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.
  • Software 750 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions.
  • a transport medium can be any medium that can communicate, propagate or transport programming for use by or in connection with an instruction execution system, apparatus, or device.
  • the transport readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic or infrared wired or wireless propagation medium.
  • Device 700 may be connected to a network, which can be any suitable type of interconnected communication system.
  • the network can implement any suitable communications protocol and can be secured by any suitable security protocol.
  • the network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.
  • Device 700 can implement any operating system suitable for operating on the network.
  • Software 750 can be written in any suitable programming language, such as C, C++, Java or Python.
  • application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service, for example.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure generally relates to generating an artistic style transfer model. An exemplary method comprises training an initial model based on a plurality of style images to obtain a neutral-style transfer model; receiving a first style image in a first style; based on the first style image, training a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receiving a second style image in a second style; and based on the second style image, training a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.

Description

IMAGE PROCESSING SYSTEM TECHNICAL FIELD
The present disclosure relates generally to image processing, and more specifically to systems and methods for transferring an artistic style to an image.
BACKGROUND ART
An artistic style transfer model can receive a content image and adapt the content image to a desirable artistic style while preserving the original content. One challenge in generating a style transfer model is achieving a satisfactory balance among speed (e.g., the time it takes a model to transfer a style to a content image) , flexibility (e.g., the number of styles a model can potentially transfer) , and quality (e.g., preserving the content and adapting the style) .
The vanilla optimization-based algorithm can produce impressive results for arbitrary styles, but is relatively slow due to its iterative nature. The fast approximation methods based on feed-forward neural networks can generate satisfactory artistic effects but bound to only a limited number of styles. Feature-matching methods can achieve arbitrary style transfer in a real-time manner but at the cost of compromised quality.
SUMMARY
The present invention is directed to generation of artistic style transfer models that achieves a balance among speed, flexibility, and quality. According to some embodiments, the generation of an artistic style transfer model includes two steps. In Step 1, a system generates a neutral-style transfer model (e.g., a neural network) based on a plurality of artistic styles. The neutral-style transfer model is a model that can be quickly trained further to result in a transfer model of any given artistic style (e.g., pop art style, expressionist style) . In Step 2, the system trains the neutral-style transfer model based on a target style to obtain a target-style transfer model.
In some embodiments, an exemplary computer-enabled method for generating an artistic style transfer model comprises: training an initial model based on a plurality of style images to obtain a neutral-style transfer model; receiving a first style image in a first style; based on the first style image, training a first instance of the neutral-style transfer model to generate a first  style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receiving a second style image in a second style; and based on the second style image, training a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.
In some embodiments, an exemplary non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to: train an initial model based on a plurality of style images to obtain a neutral-style transfer model; receive a first style image in a first style; based on the first style image, train a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receive a second style image in a second style; and based on the second style image, train a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.
In some embodiments, an exemplary system comprises: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: training an initial model based on a plurality of style images to obtain a neutral-style transfer model; receiving a first style image in a first style; based on the first style image, training a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receiving a second style image in a second style; and based on the second style image, training a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.
In some embodiments, an exemplary computer-enabled method for generating an artistic style transfer model comprises: updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein training the initial model comprises a bi-level  optimization process; and based on a style image, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
In some embodiments, an exemplary non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to: update an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein training the initial model comprises a bi-level optimization process; and based on a style image, update an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
In some embodiments, an exemplary system comprises: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein training the initial model comprises a bi-level optimization process; and updating, based on a style image, training an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding of the various described embodiments, reference should be made to the Description of Embodiments below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
FIG. 1A illustrates an exemplary process for generating a target-style transfer model configured to adapt a given content image to the target artistic style, according to some embodiments.
FIG. 1B illustrates exemplary style transfer models, according to some embodiments.
FIG. 2A illustrates an exemplary process for generating a neutral-style transfer model, according to some embodiments.
FIG. 2B illustrates an exemplary process for generating a neutral-style transfer model, according to some embodiments.
FIG. 2C illustrates an exemplary process for generating a neutral-style transfer model, according to some embodiments.
FIG. 3A illustrates a bi-level optimization problem, according to some embodiments.
FIG. 3B illustrates an exemplary set of pseudo code implementing a method for obtaining a neutral-style transfer model, according to some embodiments.
FIG. 4 depicts illustrates an exemplary process for generating a target-style transfer model based on a neutral-style transfer model, according to some embodiments.
FIG. 5 illustrates exemplary neural network architecture, according to some embodiments.
FIG. 6 illustrates an exemplary process for generating a target-style transfer model, according to some embodiments.
FIG. 7 depicts an exemplary electronic device, according to some embodiments.
DETAILED DESCRIPTION OF THE EMBODIMENTS
The present invention is directed to generation of artistic style transfer models that achieves a balance among speed, flexibility, and quality. According to some embodiments, the generation of an artistic style transfer model includes two steps.
In Step 1, a system generates a neutral-style transfer model (e.g., a trained neural network) based on a plurality of artistic styles. The neutral-style transfer model is a model that can be quickly trained further to result in a transfer model of any given artistic style (e.g., pop art style, expressionist style) . In some embodiments, Step 1 is implemented by solving a bi-level optimization problem.
In Step 2, the system trains the neutral-style transfer model based on a target style (e.g., a style image having the target style) to obtain a target-style transfer model. The neutral-style model can be trained based on any arbitrary artistic style (thus achieving flexibility) using only a few post-processing update steps (thus achieving speed) while maintaining high style transfer quality (thus achieving quality) . In some embodiments, the adaptation of a neutral-style model to a target-style model can take between approximately 5 -30 seconds.
The target-style transfer model can receive a content image from a user and adapt the content image to the target style while preserving the original content. The target-style transfer model can be reused to adapt any number of content images (e.g., multiple images of a video) to the target style. In some embodiments, the time the target-style model takes to transfer the target style to a content image is relatively short. In some embodiments, it can take approximately 0.004 second to adapt a content image of size 256 *256 to the target style and approximately 0.01 second to adapt a content image of size 512 *512 to the target style.
Step 1 and/or Step 2 can be implemented on one or more mobile phones, one or more computers, one or more remote devices, or a combination thereof. The neutral-style transfer model can be stored on a device and accessible via a web app or a mobile app such that a user can submit any content image and any style image and receive an adapted image in real-time. The adapted image can be provided to the user in a variety of ways, such as displayed on an electronic display, downloaded as a file, sent to and printed on a printer (e.g., paper printer, fabric printer, 3D printer, plastic printer) , added to a different file (e.g., a video animation file) , etc.
The following description is presented to enable a person of ordinary skill in the art to make and use the various embodiments. Descriptions of specific devices, techniques, and applications are provided only as examples. Various modifications to the examples described herein will be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other examples and applications without departing from the spirit and scope of the various embodiments. Thus, the various embodiments are not intended to be limited to the examples described herein and shown, but are to be accorded the scope consistent with the claims.
The following description sets forth exemplary methods, parameters, and the like. It should be recognized, however, that such description is not intended as a limitation on the scope of the present disclosure but is instead provided as a description of exemplary embodiments.
Although the following description uses terms “first, ” “second, ” etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first style could be termed a second style, and, similarly, a second style could be termed a first style, without departing from the scope of  the various described embodiments. The first style and the second style are both styles, but they are not the same style.
The terminology used in the description of the various described embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and the appended claims, the singular forms “a, ” “an, ” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes, ” “including, ” “comprises, ” and/or “comprising, ” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The term “if” is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting, ” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event] ” or “in response to detecting [the stated condition or event] , ” depending on the context.
FIG. 1A illustrates an exemplary process 100 for generating a target-style transfer model configured to adapt content images to the target artistic style, according to some embodiments. Process 100 is performed, for example, using one or more electronic devices. In some examples, process 100 is performed using a client-server system, and the blocks of process 100 are divided up in any manner between the server and client device (s) . Thus, while portions of process 100 are described herein as being performed by an electronic device, it will be appreciated that process 100 is not so limited. In process 100, some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally, omitted. In some examples, additional steps may be performed in combination with the process 100. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
With reference to FIG. 1, the process 100 generates the target-style transfer model in two steps. In Step 1 (labelled 120) , a system (e.g., one or more electronic devices) generates a neutral-style transfer model 106 based on a plurality of artistic styles 102. The plurality of styles 102 can include a plurality of images having the plurality of styles, respectively. The neutral-style transfer model 106 is a model that can be quickly trained further to result in a transfer model of any given artistic style (e.g., pop art style, expressionist style) .
In Step 2 (labelled 122) , the system trains the neutral-style transfer model 106 based on a target style 108 to obtain a target-style transfer model 112. The target-style transfer model 112 can receive a content image from a user and adapt the content image to the target style while preserving the original content. As an example, an expressionist transfer model can receive a content image (e.g., an image depicting a face) and adapt the content image to the expressionist style while preserving the original content (e.g., the face) . The adapted image can be provided to the user in a variety of ways, such as displayed on an electronic display, downloaded as a file, sent to and printed on a printer (e.g., paper printer, fabric printer, 3D printer, plastic printer) , added to a different file (e.g., a video animation file) , etc.
In some embodiments, the neutral-style transfer model 106 is stored on an electronic device as a part of a mobile app or a web app. A user can provide a target style image and a content image. Within a limited number of iterations, the system can train the neutral-style transfer model 106 to obtain a target-style transfer model based on the target style image. As discussed above, this can take between approximately 5-30 seconds in some embodiments. The target-style transfer model can be reused to adapt any number of content images to the target style. As discussed above, the time to adapt a content image to the target style can be well below 1 second in some embodiments.
Using the two-step process 100, a transfer model of any given artistic style can be generated quickly. In some examples, Step 2 can be completed between 5 seconds to 30 seconds. Further, the resulting model can produce a high-quality artistic style transfer. Thus, process 100 achieves a desirable balance among speed, flexibility, and quality.
FIG. 1B illustrates exemplary models generated as a result of the process 100, according to some embodiments. Post Step 1, the system obtains a neutral-style transfer model. For  illustration purposes, FIG. 1B shows an exemplary neutral-style transfer model 122, which can receive a content image 130 and produce an image 132 adapted to the neutral style.
The neutral-style transfer model 122 is configured to be trained further in Step 2 to produce a final, target-style transfer model that can receive content images and adapt the content images to the target style. FIG. 1B further depicts exemplary target-style transfer models. A target-style transfer model 124 results from training the neutral-style transfer model 122 based on Style A (e.g., a style image in Style A) in Step 2. The model 124 is configured to receive a content image 130 and adapt the content image to Style A. As another example, a target-style transfer model 126 results from training the neutral-style transfer model 122 based on Style B (e.g., a style image in Style B) in Step 2. The model 126 is configured to receive a content image 130 and adapt the content image to Style B.
Returning to FIG. 1A, the process 100 is described in more detail. At block 104, the system trains a model based on a plurality of styles 102. The model can be any machine learning model. In some embodiments, the model is a neutral network. The parameters of the neural work can be randomly initialized. In some embodiments, the plurality of styles 102 includes a plurality of image sets corresponding to the plurality of styles. Each image set includes one or more style images in the corresponding artistic style. As shown, Step 1 results in the neutral-style transfer model 106.
FIG. 2A illustrates an exemplary process implementing Step 1, according to some embodiments. At block 202, the system selects a batch of styles from the plurality of styles. In some embodiments, the batch of styles comprises style images randomly selected from the plurality of styles 102.
At block 204, the system calculates a loss (also referred to as “outer loss” or “aggregated outer loss” ) corresponding to the selected batch of styles, thus obtaining the loss 210. In some embodiments, the loss is a perceptual loss. In some embodiments, the loss is indicative of the distances between the images generated by the model and the selected batch of styles (i.e., style images corresponding to the batch of styles) . In some embodiments, the loss 210 is obtained by training the model based on each style of the batch of styles, calculating a loss corresponding to each style, and aggregating (and/or averaging) the losses. An exemplary process of block 204 is described in detail with reference to FIG. 2B.
After the loss corresponding to the selected batch of styles 210 is obtained, at block 212, the system updates the model based on the loss 210. In some embodiments, the system updates the model (e.g., updating the parameters of the neural network) to minimize the loss 210. As discussed further with reference to FIG. 3B, the updating of the model can be represented algorithmically by the pseudo code below, in which θ represents parameters in the model, E represents the loss 210, and η represents the outer learning rate.
Figure PCTCN2019124417-appb-000001
Blocks 202-212 can be repeated until a condition is met. In some embodiments, the condition is a predefined number of update iterations such that a loop terminates when this number of iterations is obtained. Thus, until the condition is met, the system continues to obtain a new batch of styles, obtain a loss corresponding to the new batch of styles, and update the model accordingly. This repeated process can be referred to as the “outer loop” of the process 120 (i.e., Step 1) . Further, the loss 210 can be referred to as the “outer loss” or “aggregated outer loss” . At the end of the process 120 (i.e., Step 1) , a neutral-style transfer model 106 is obtained.
FIG. 2B illustrates an exemplary process of block 204 for obtaining an outer loss corresponding to a batch of styles, according to some embodiments. At block 224, the system trains the model based on a particular style of the batch of styles (e.g., a style image in the particular style) . After the model is trained, at block 226, the system calculate a loss (or “outer loss” ) corresponding to the particular style. The outer loss corresponding to the particular style can be a perceptual loss. For example, as a part of block 226, the system samples a batch of content images from a content validation dataset and adapts the batch of content images using the model. In some embodiments, the outer loss can be determined based on the distance between each adapted image and the particular style (i.e., style image) and then aggregating (and/or averaging) the distances for all adapted images. In some embodiments, the distance is calculated using a perceptual loss formula, as discussed with reference to FIG. 3A.
At block 228, the system updates an aggregated loss (or aggregated outer loss) corresponding to the batch of styles based on the loss corresponding to the particular style. In some embodiments, the aggregated outer loss is incremented by the loss corresponding to the particular style. As discussed further with reference to FIG. 3B, blocks 226 and 228 can be  represented algorithmically by the pseudo code below, in which Dval represents a content validation dataset, Is represents the particular style, and E represents the aggregated outer loss.
sample a batch
Figure PCTCN2019124417-appb-000002
from
Figure PCTCN2019124417-appb-000003
increment E by loss from I s and
Figure PCTCN2019124417-appb-000004
As shown in FIG. 2B, blocks 224-228 are performed for each style of the patch of styles. In each iteration, the model is trained by a particular style and a loss (or outer loss) corresponding to the particular style is calculated. At the end of the process, the loss (or outer loss) corresponding to the batch of styles 210 is obtained. In some embodiments, the loss 210 is the sum of all outer losses corresponding to all styles in the batch of styles. In some embodiments, the loss 210 is the average of all outer losses corresponding to all styles in the batch of styles.
FIG. 2C illustrates an exemplary process of block 224 for training a model based on a single style, according to some embodiments. At block 232, the system samples a batch of content images. In some embodiments, the batch of content images is sampled from a content training dataset, which is different from the content validation dataset. At block 234, the system calculates a loss corresponding to the batch of content (or “inner loss” ) based on the batch of content and the particular style. As a part of block 234, the system can adapt the batch of content images using the model. The inner loss can be determined by calculating the distance between each adapted image and the particular style (i.e., the particular style image) and aggregating (or averaging) the distances for all adapted images. In some embodiments, the distance is calculated using a perceptual loss formula, as discussed with reference to FIG. 3A.
At block 236, based on the inner loss, the system updates the model. In some embodiments, the system updates the model (e.g., updating the parameters of the neural network) to minimize the inner loss. As discussed further with reference to FIG. 3B, the updating of the model can be represented algorithmically by the pseudo code below, in which w represents parameters in the model, L represents the inner loss, and δ represents the inner learning rate.
Figure PCTCN2019124417-appb-000005
Blocks 232-236 can be repeated until a condition is met. In each iteration, a new batch of content is sampled and the model is updated based on a loss (or inner loss) corresponding to the  batch of content. This repeated process is referred to as the “inner loop” of Step 1. As discussed further with reference to FIG. 3B, this repeated process can be represented algorithmically by the pseudo code below, in which Θ represents parameters in the model, T represents the number of inner updates, and D tr represents the content training dataset. In some embodiments, T is in the range between 1 and 5.
Figure PCTCN2019124417-appb-000006
FIG. 3B illustrates an exemplary set of pseudo code implementing Step 1 to obtain a neutral-style transfer model, according to some embodiments. As shown, the output of the algorithm includes trained parameters Θ of the model. The algorithm trains the model to solve a bi-level optimization problem shown in FIG. 3A (also reproduced below) :
Figure PCTCN2019124417-appb-000007
Figure PCTCN2019124417-appb-000008
Equation 2 corresponds to training the parameters of the model M in the inner loop of the process. Θ represents the initialized parameters of the model M. w represents the trained parameters of the model M, now denoted W s, t to indicate that w is trained based on a particular style. As discussed above with reference to FIG. 2C, the inner loop of the process trains the model such that the model is optimized (e.g., loss is minimized) with respect to individual styles.
Equation 1 corresponds to training the parameters of the model M (e.g., a neural network) in the outer loop. Equation 1 indicates that w s, T, which are trained parameters from the inner loop (Equation 2) , are the parameters of the model M in the outer loop. As discussed above with reference to FIG. 2A, the outer loop of the process trains the model such that the model is optimized (e.g., loss is minimized) with respect to batches of styles.
M (x; y) represents the output of the model M. The input x of the model M is a content image (I c) . y represents a set of parameters of the model M. The output of M is an adapted image (I x) that preserves the content of I c in a desirable style I s.
l represents a perceptual loss indicative of the compatibility of I x and (I c, I s) and can be calculated as a sum of content difference between the content image and the solution and the style difference between the style image and the solution, denoted as
Figure PCTCN2019124417-appb-000009
As such, l can be used to calculate a loss (inner or outer) between an adapted image of a content image and the image pair (i.e., the original content image and a target style image) . As shown by  Equations  1 and 2 and FIGS. 2A-B, the calculated losses can be used to update the model in the inner loop and the outer loop.
Both the inner objective and the outer objective are designed to be the perceptual loss averaged across datasets. In some embodiments, the inner objective only optimizes contents in the training set, whereas the outer objective generalizes to contents in the validation set. The expectation of the outer objective E c, sis taken with respect to both the styles and the content images in the validation set, whereas the expectation of the inner objective E c is taken with respect to the content images in the training set only. With reference to the algorithm in FIG. 3B, the inner loop involves sampling a content batch from the content training dataset (D tr) , whereas the outer loop involves additionally sampling a content batch from the content validation dataset (D val) .
This design allows the adapted model to specialize for a single style but still maintains the initialization generalized enough. Note that for the outer objective, w s, T implicitly depends on Θ. In essence, the framework trains an initialization M (.; Θ) that could adapt to M (.; w s, T) efficiently and preserve high image quality for an arbitrary style.
The explicit training-validation separation in the framework forces the style transfer model to generalize to unobserved content images without over-fitting to the training set. Coupled with this separation, the system constrains the number of steps in the gradient dynamics computation to encourage quick adaptation for an arbitrary style and, at the same time, picks an  image transformation network due to its efficiency and high transfer quality. These characters serve to the trade-offs among speed, flexibility, and quality.
FIG. 4 illustrates illustrates an exemplary process implementing Step 2, according to some embodiments. At block 402, the system selects a batch of content images. At block 404, the system calculates a loss based on the neutral-style transfer model and the target style 108. For example, the system first adapts the batch of content images using the neutral-style adaption model 106 to obtain a batch of adapted images. The system can then calculate the distance (e.g., perceptual loss) between an adapted image and the image pair (original content image, target style image 108) . The distances can be aggregated and/or averaged to obtain the loss.
At block 406, the system updates the model based on the loss. In some embodiments, the system updates the model to minimize the loss. This process can be represented by the equation below. Note that the model is initialized with the trained parameters from Step 1 (i.e., the neutral-style transfer model) .
Figure PCTCN2019124417-appb-000010
The blocks 402-406 can be repeated until a condition is met. In some embodiments, the condition is a predefined number (e.g., 100) . At the end of the process 122, a target-style transfer model 112 is obtained. The model 112 can transfer the target style to any content image with high style-transfer quality in real time.
FIG. 5 depicts an exemplary neural network architecture, according to some embodiments. In some embodiments, the network architecture is an image transformation network. In some embodiments, the output of the last convolution layer is unnormalized and activated using the Sigmoid function to squash it into [0, 1] . Upsampled convolution, which first upsamples the input and then performs convolution, and reflection padding are used to avoid checkerboard effects. Further, an instance normalization layer is appended after each convolution layer, except the last. This design forces the parameters in instance normalization layers to learn from an implicit, unobserved neutral style while keeping the model size parsimonious.
In some embodiments, small-batch learning is used to approximate both the inner and outer objective. The inner objective is approximated by several batches sampled from the  training dataset and computed on a single style, whereas the outer objective is approximated by a style batch, in which each style incurs a perceptual loss computed over a content batch sampled from the validation dataset.
FIG. 6 illustrates an exemplary process 600 for generating a target-style transfer model configured to adapt a given content image to the target artistic style, according to some embodiments. Process 600 is performed, for example, using one or more electronic devices. In some examples, process 600 is performed using a client-server system, and the blocks of process 600 are divided up in any manner between the server and client device (s) . Thus, while portions of process 600 are described herein as being performed by an electronic device, it will be appreciated that process 600 is not so limited. In process 600, some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally, omitted. In some examples, additional steps may be performed in combination with the process 600. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
At block 602, a system (e.g., one or more electronic devices) updates an initial model based on a plurality of style images to obtain a neutral-style transfer model. At block 604, the system receives a first style image in a first style. At block 606, the system, based on the first style image, updates a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and output, via a display, an adapted image in the first style. At block 610, the system receives a second style image in a second style. At block 612, the system, based on the second style image, updates a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and output, via the display, an adapted image in the second style.
In some embodiments, the initial model is a neural network.
In some embodiments, updating the first instance of the neutral-style transfer model to generate the first style transfer model comprises: updating the initial model based on a first batch of style images, wherein the first batch of style images is sampled from the plurality of style images; and after updating the initial model based on the first batch of style images, updating the  initial model based on a second batch of style images, wherein the second batch of style images is sampled from the plurality of style images.
In some embodiments, updating the initial model based on the first batch of style images comprises: obtaining an outer loss corresponding to the first batch of style images; and updating the initial model based on the outer loss.
In some embodiments, updating the initial model based on the outer loss comprises updating one or more parameters of the initial model according to:
Figure PCTCN2019124417-appb-000011
wherein θ represents the one or more parameters of the initial model, E is based on the outer loss corresponding to the first batch of style images, and η represents an outer learning rate.
In some embodiments, obtaining the outer loss corresponding to the first batch of style images comprises: performing a first training, wherein the first training comprises updating the initial model based on a first style image of the first batch of style images; after the first training, calculating a first outer loss corresponding to the first style image of the first batch of style images; performing a second training, wherein the second training comprises updating the initial model based on a second style image of the first batch of style images; after the second training, calculating a second outer loss corresponding to the second style image of the first batch of style images; and calculating the outer loss corresponding to the first batch of style images based on the first outer loss and the second outer loss.
In some embodiments, the method further comprises aggregating the first outer loss and the second outer loss.
In some embodiments, the method further comprises: averaging the first outer loss and the second outer loss.
In some embodiments, calculating the first outer loss comprises: sampling a content image; after the first training, obtaining an adapted image corresponding to the sampled content image based on the initial model; and calculating a perceptual loss based on the sampled content image and the adapted image.
In some embodiments, the content image is sampled from a validation set of content images.
In some embodiments, the perceptual loss is calculated based on a content loss between the sampled content image and the adapted image and a style loss between the sampled content image and the adapted image.
In some embodiments, performing the first training comprises: sampling a first batch of content images; calculating a first inner loss corresponding to the first batch of content images; updating the initial model based on the first inner loss.
In some embodiments, performing the first training further comprises: sampling a second batch of content images; calculating a second inner loss corresponding to the second batch of content images; and updating the initial model based on the second inner loss.
In some embodiments, the first batch of content images and the second batch of content images are sampled from a training set of content images.
In some embodiments, updating the initial model based on the first inner loss comprises updating one or more parameters of the initial model according to:
Figure PCTCN2019124417-appb-000012
wherein w represents the one or more parameters, L is based on the first inner loss, and δ represents an inner learning rate.
In some embodiments, an exemplary computer-enabled method for generating an artistic style transfer model comprises updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein updating the initial model comprises a bi-level optimization process; and based on a style image, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
The operations described above with reference to FIG. 6 are optionally implemented by components depicted in FIG. 7. It would be clear to a person having ordinary skill in the art how other processes are implemented based on the components depicted in FIG. 7.
FIG. 7 illustrates an example of a computing device in accordance with one embodiment. Device 700 can be a host computer connected to a network. Device 700 can be a client computer  or a server. As shown in FIG. 7, device 700 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server or handheld computing device (portable electronic device) such as a phone or tablet. The device can include, for example, one or more of processor 710, input device 720, output device 730, storage 740, and communication device 760. Input device 720 and output device 730 can generally correspond to those described above, and can either be connectable or integrated with the computer.
Input device 720 can be any suitable device that provides input, such as a touch screen, keyboard or keypad, mouse, or voice-recognition device. Output device 730 can be any suitable device that provides output, such as a touch screen, haptics device, or speaker.
Storage 740 can be any suitable device that provides storage, such as an electrical, magnetic or optical memory including a RAM, cache, hard drive, or removable storage disk. Communication device 760 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or device. The components of the computer can be connected in any suitable manner, such as via a physical bus or wirelessly.
Software 750, which can be stored in storage 740 and executed by processor 710, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the devices as described above) .
Software 750 can also be stored and/or transported within any non-transitory computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions. In the context of this disclosure, a computer-readable storage medium can be any medium, such as storage 740, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.
Software 750 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions. In the context of this disclosure, a transport medium can be any medium that can communicate, propagate or transport programming for use by or in connection with an instruction execution system, apparatus, or  device. The transport readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic or infrared wired or wireless propagation medium.
Device 700 may be connected to a network, which can be any suitable type of interconnected communication system. The network can implement any suitable communications protocol and can be secured by any suitable security protocol. The network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.
Device 700 can implement any operating system suitable for operating on the network. Software 750 can be written in any suitable programming language, such as C, C++, Java or Python. In various embodiments, application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service, for example.
Although the disclosure and examples have been fully described with reference to the accompanying figures, it is to be noted that various changes and modifications will become apparent to those skilled in the art. Such changes and modifications are to be understood as being included within the scope of the disclosure and examples as defined by the claims.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the techniques and their practical applications. Others skilled in the art are thereby enabled to best utilize the techniques and various embodiments with various modifications as are suited to the particular use contemplated.

Claims (22)

  1. A computer-enabled method for generating an artistic style transfer model, the method comprising:
    updating an initial model based on a plurality of style images to obtain a neutral-style transfer model;
    receiving a first style image in a first style from a user;
    based on the first style image, updating a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and output, via a display, an adapted content image in the first style;
    receiving a second style image in a second style from the user; and
    based on the second style image, updating a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and output, via the display, an adapted content image in the second style.
  2. The method according to claim 1, wherein the initial model is a neural network.
  3. The method according to any of claims 1-2, wherein updating the first instance of the neutral-style transfer model to generate the first style transfer model comprises:
    updating the initial model based on a first batch of style images, wherein the first batch of style images is sampled from the plurality of style images; and
    after updating the initial model based on the first batch of style images, updating the initial model based on a second batch of style images, wherein the second batch of style images is sampled from the plurality of style images.
  4. The method according to claim 3, wherein updating the initial model based on the first batch of style images comprises:
    obtaining an outer loss corresponding to the first batch of style images; and
    updating the initial model based on the outer loss.
  5. The method according to claim 4, wherein updating the initial model based on the outer loss comprises updating one or more parameters of the initial model according to:
    Figure PCTCN2019124417-appb-100001
    wherein θ represents the one or more parameters of the initial model, E is based on the outer loss corresponding to the first batch of style images, and η represents an outer learning rate.
  6. The method according to any of claim 4-5, wherein obtaining the outer loss corresponding to the first batch of style images comprises:
    performing a first training, wherein the first training comprises updating the initial model based on a first style image of the first batch of style images;
    after the first training, calculating a first outer loss corresponding to the first style image of the first batch of style images;
    performing a second training, wherein the second training comprises updating the initial model based on a second style image of the first batch of style images;
    after the second training, calculating a second outer loss corresponding to the second style image of the first batch of style images; and
    calculating the outer loss corresponding to the first batch of style images based on the first outer loss and the second outer loss.
  7. The method according to claim 6, further comprising: aggregating the first outer loss and the second outer loss.
  8. The method according to claim 6, further comprising: averaging the first outer loss and the second outer loss.
  9. The method according to any of claims 6-8, wherein calculating the first outer loss comprises:
    sampling a content image;
    after the first training, obtaining an adapted image corresponding to the sampled content image based on the initial model; and
    calculating a perceptual loss based on the sampled content image and the adapted image.
  10. The method according to claim 9, wherein the content image is sampled from a validation set of content images.
  11. The method according to any of claims 1-10, wherein the perceptual loss is calculated based on a content loss between the sampled content image and the adapted image and a style loss between the sampled content image and the adapted image.
  12. The method according to any of claims 6-11, wherein performing the first training comprises:
    sampling a first batch of content images;
    calculating a first inner loss corresponding to the first batch of content images;
    updating the initial model based on the first inner loss.
  13. The method according to claim 12, wherein performing the first training further comprises:
    sampling a second batch of content images;
    calculating a second inner loss corresponding to the second batch of content images; and
    updating the initial model based on the second inner loss.
  14. The method according to any of claims 12-13, wherein the first batch of content images and the second batch of content images are sampled from a training set of content images.
  15. The method according to claim 12, wherein updating the initial model based on the first inner loss comprises updating one or more parameters of the initial model according to:
    Figure PCTCN2019124417-appb-100002
    wherein w represents the one or more parameters, L is based on the first inner loss, and δrepresents an inner learning rate.
  16. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to:
    update an initial model based on a plurality of style images to obtain a neutral-style transfer model;
    receive a first style image in a first style;
    based on the first style image, update a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and output, via a display, an adapted image in the first style;
    receive a second style image in a second style; and
    based on the second style image, update a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and output, via the display, an adapted image in the second style.
  17. A system, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:
    updating an initial model based on a plurality of style images to obtain a neutral-style transfer model;
    receiving a first style image in a first style;
    based on the first style image, updating a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and output, via a display, an adapted image in the first style;
    receiving a second style image in a second style; and
    based on the second style image, updating a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and output, via the display, an adapted image in the second style.
  18. A computer-enabled method for generating an artistic style transfer model, the method comprising:
    updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein updating the initial model comprises a bi-level optimization process; and
    based on a style image, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted content image in a style of the style image.
  19. The method according to claim 18, wherein the bi-level optimization process comprises:
    updating the initial model based on an outer loss corresponding to a batch of style images from the plurality of style images.
  20. The method according to claim 19, wherein the bi-level optimization process comprises:
    updating the initial model based on an inner loss corresponding to a style image from the batch of style images.
  21. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to:
    update an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein updating the initial model comprises a bi-level optimization process; and
    based on a style image provided by a user, update an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
  22. A system, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:
    updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein updating the initial model comprises a bi-level optimization process; and
    updating, based on a style image provided by a user, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
PCT/CN2019/124417 2018-12-21 2019-12-10 Image processing system WO2020125505A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862784163P 2018-12-21 2018-12-21
US62/784,163 2018-12-21

Publications (1)

Publication Number Publication Date
WO2020125505A1 true WO2020125505A1 (en) 2020-06-25

Family

ID=71100394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/124417 WO2020125505A1 (en) 2018-12-21 2019-12-10 Image processing system

Country Status (1)

Country Link
WO (1) WO2020125505A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516582A (en) * 2021-04-12 2021-10-19 浙江大学 Network model training method and device for image style migration, computer equipment and storage medium
WO2022088878A1 (en) * 2020-10-30 2022-05-05 北京字节跳动网络技术有限公司 Style image generation method, model training method and apparatus, and device and medium
WO2023151299A1 (en) * 2022-02-11 2023-08-17 华为云计算技术有限公司 Data generation method and apparatus, device, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008114937A1 (en) * 2007-03-22 2008-09-25 Industry-Academic Cooperation Foundation, Yonsei University Virtual face generating method
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN107977414A (en) * 2017-11-22 2018-05-01 西安财经学院 Image Style Transfer method and its system based on deep learning
US20180285679A1 (en) * 2017-03-31 2018-10-04 Qualcomm Incorporated Systems and methods for improved image textures
US10147459B2 (en) * 2016-09-22 2018-12-04 Apple Inc. Artistic style transfer for videos
US20180357800A1 (en) * 2017-06-09 2018-12-13 Adobe Systems Incorporated Multimodal style-transfer network for applying style features from multi-resolution style exemplars to input images

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008114937A1 (en) * 2007-03-22 2008-09-25 Industry-Academic Cooperation Foundation, Yonsei University Virtual face generating method
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
US10147459B2 (en) * 2016-09-22 2018-12-04 Apple Inc. Artistic style transfer for videos
US20180285679A1 (en) * 2017-03-31 2018-10-04 Qualcomm Incorporated Systems and methods for improved image textures
US20180357800A1 (en) * 2017-06-09 2018-12-13 Adobe Systems Incorporated Multimodal style-transfer network for applying style features from multi-resolution style exemplars to input images
CN107977414A (en) * 2017-11-22 2018-05-01 西安财经学院 Image Style Transfer method and its system based on deep learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022088878A1 (en) * 2020-10-30 2022-05-05 北京字节跳动网络技术有限公司 Style image generation method, model training method and apparatus, and device and medium
CN113516582A (en) * 2021-04-12 2021-10-19 浙江大学 Network model training method and device for image style migration, computer equipment and storage medium
CN113516582B (en) * 2021-04-12 2023-08-18 浙江大学 Network model training method, device, computer equipment and storage medium for image style migration
WO2023151299A1 (en) * 2022-02-11 2023-08-17 华为云计算技术有限公司 Data generation method and apparatus, device, and storage medium

Similar Documents

Publication Publication Date Title
US20230376771A1 (en) Training machine learning models by determining update rules using neural networks
US10991074B2 (en) Transforming source domain images into target domain images
US10713816B2 (en) Fully convolutional color constancy with confidence weighted pooling
US11769051B2 (en) Training neural networks using normalized target outputs
US11651218B1 (en) Adversartail training of neural networks
WO2020125505A1 (en) Image processing system
US20240346310A1 (en) Population based training of neural networks
WO2019024808A1 (en) Training method and apparatus for semantic segmentation model, electronic device and storage medium
CN115688877A (en) Method and computing device for fixed-point processing of data to be quantized
CN115841137A (en) Method and computing device for fixed-point processing of data to be quantized
WO2023050707A1 (en) Network model quantization method and apparatus, and computer device and storage medium
US20160358073A1 (en) Whitened neural network layers
CN109074820A (en) Audio processing is carried out using neural network
US11951622B2 (en) Domain adaptation using simulation to simulation transfer
CN108228700B (en) Training method and device of image description model, electronic equipment and storage medium
US9798612B1 (en) Artifact correction using neural networks
US20200349447A1 (en) Optimizing Unsupervised Generative Adversarial Networks via Latent Space Regularizations
WO2019075267A1 (en) Self-gating activation neural network layers
WO2023005386A1 (en) Model training method and apparatus
CN108229652B (en) Neural network model migration method and system, electronic device, program, and medium
CN110795235A (en) Method and system for deep learning and cooperation of mobile web
CN109584146A (en) U.S. face treating method and apparatus, electronic equipment and computer storage medium
WO2021244203A1 (en) Parameter optimization method, electronic device and storage medium
WO2022077343A1 (en) Method and apparatus for weight-sharing neural network with stochastic architectures
WO2023121950A1 (en) Performing classification using post-hoc augmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19899175

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19899175

Country of ref document: EP

Kind code of ref document: A1