WO2020125505A1

WO2020125505A1 - Image processing system

Info

Publication number: WO2020125505A1
Application number: PCT/CN2019/124417
Authority: WO
Inventors: Song-chun ZHU
Original assignee: Land And Fields Limited
Priority date: 2018-12-21
Filing date: 2019-12-10
Publication date: 2020-06-25

Abstract

The present disclosure generally relates to generating an artistic style transfer model. An exemplary method comprises training an initial model based on a plurality of style images to obtain a neutral-style transfer model; receiving a first style image in a first style; based on the first style image, training a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receiving a second style image in a second style; and based on the second style image, training a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.

Description

IMAGE PROCESSING SYSTEM

TECHNICAL FIELD

The present disclosure relates generally to image processing, and more specifically to systems and methods for transferring an artistic style to an image.

BACKGROUND ART

An artistic style transfer model can receive a content image and adapt the content image to a desirable artistic style while preserving the original content. One challenge in generating a style transfer model is achieving a satisfactory balance among speed (e.g., the time it takes a model to transfer a style to a content image) , flexibility (e.g., the number of styles a model can potentially transfer) , and quality (e.g., preserving the content and adapting the style) .

The vanilla optimization-based algorithm can produce impressive results for arbitrary styles, but is relatively slow due to its iterative nature. The fast approximation methods based on feed-forward neural networks can generate satisfactory artistic effects but bound to only a limited number of styles. Feature-matching methods can achieve arbitrary style transfer in a real-time manner but at the cost of compromised quality.

SUMMARY

The present invention is directed to generation of artistic style transfer models that achieves a balance among speed, flexibility, and quality. According to some embodiments, the generation of an artistic style transfer model includes two steps. In Step 1, a system generates a neutral-style transfer model (e.g., a neural network) based on a plurality of artistic styles. The neutral-style transfer model is a model that can be quickly trained further to result in a transfer model of any given artistic style (e.g., pop art style, expressionist style) . In Step 2, the system trains the neutral-style transfer model based on a target style to obtain a target-style transfer model.

In some embodiments, an exemplary computer-enabled method for generating an artistic style transfer model comprises: training an initial model based on a plurality of style images to obtain a neutral-style transfer model; receiving a first style image in a first style; based on the first style image, training a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receiving a second style image in a second style; and based on the second style image, training a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.

In some embodiments, an exemplary non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to: train an initial model based on a plurality of style images to obtain a neutral-style transfer model; receive a first style image in a first style; based on the first style image, train a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receive a second style image in a second style; and based on the second style image, train a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.

In some embodiments, an exemplary system comprises: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: training an initial model based on a plurality of style images to obtain a neutral-style transfer model; receiving a first style image in a first style; based on the first style image, training a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and provide an adapted image in the first style; receiving a second style image in a second style; and based on the second style image, training a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and provide an adapted image in the second style.

In some embodiments, an exemplary computer-enabled method for generating an artistic style transfer model comprises: updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein training the initial model comprises a bi-level optimization process; and based on a style image, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.

In some embodiments, an exemplary non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to: update an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein training the initial model comprises a bi-level optimization process; and based on a style image, update an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.

In some embodiments, an exemplary system comprises: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein training the initial model comprises a bi-level optimization process; and updating, based on a style image, training an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the various described embodiments, reference should be made to the Description of Embodiments below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.

FIG. 1A illustrates an exemplary process for generating a target-style transfer model configured to adapt a given content image to the target artistic style, according to some embodiments.

FIG. 1B illustrates exemplary style transfer models, according to some embodiments.

FIG. 2A illustrates an exemplary process for generating a neutral-style transfer model, according to some embodiments.

FIG. 2B illustrates an exemplary process for generating a neutral-style transfer model, according to some embodiments.

FIG. 2C illustrates an exemplary process for generating a neutral-style transfer model, according to some embodiments.

FIG. 3A illustrates a bi-level optimization problem, according to some embodiments.

FIG. 3B illustrates an exemplary set of pseudo code implementing a method for obtaining a neutral-style transfer model, according to some embodiments.

FIG. 4 depicts illustrates an exemplary process for generating a target-style transfer model based on a neutral-style transfer model, according to some embodiments.

FIG. 5 illustrates exemplary neural network architecture, according to some embodiments.

FIG. 6 illustrates an exemplary process for generating a target-style transfer model, according to some embodiments.

FIG. 7 depicts an exemplary electronic device, according to some embodiments.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The present invention is directed to generation of artistic style transfer models that achieves a balance among speed, flexibility, and quality. According to some embodiments, the generation of an artistic style transfer model includes two steps.

In Step 1, a system generates a neutral-style transfer model (e.g., a trained neural network) based on a plurality of artistic styles. The neutral-style transfer model is a model that can be quickly trained further to result in a transfer model of any given artistic style (e.g., pop art style, expressionist style) . In some embodiments, Step 1 is implemented by solving a bi-level optimization problem.

In Step 2, the system trains the neutral-style transfer model based on a target style (e.g., a style image having the target style) to obtain a target-style transfer model. The neutral-style model can be trained based on any arbitrary artistic style (thus achieving flexibility) using only a few post-processing update steps (thus achieving speed) while maintaining high style transfer quality (thus achieving quality) . In some embodiments, the adaptation of a neutral-style model to a target-style model can take between approximately 5 -30 seconds.

The target-style transfer model can receive a content image from a user and adapt the content image to the target style while preserving the original content. The target-style transfer model can be reused to adapt any number of content images (e.g., multiple images of a video) to the target style. In some embodiments, the time the target-style model takes to transfer the target style to a content image is relatively short. In some embodiments, it can take approximately 0.004 second to adapt a content image of size 256 *256 to the target style and approximately 0.01 second to adapt a content image of size 512 *512 to the target style.

Step 1 and/or Step 2 can be implemented on one or more mobile phones, one or more computers, one or more remote devices, or a combination thereof. The neutral-style transfer model can be stored on a device and accessible via a web app or a mobile app such that a user can submit any content image and any style image and receive an adapted image in real-time. The adapted image can be provided to the user in a variety of ways, such as displayed on an electronic display, downloaded as a file, sent to and printed on a printer (e.g., paper printer, fabric printer, 3D printer, plastic printer) , added to a different file (e.g., a video animation file) , etc.

The following description is presented to enable a person of ordinary skill in the art to make and use the various embodiments. Descriptions of specific devices, techniques, and applications are provided only as examples. Various modifications to the examples described herein will be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other examples and applications without departing from the spirit and scope of the various embodiments. Thus, the various embodiments are not intended to be limited to the examples described herein and shown, but are to be accorded the scope consistent with the claims.

The following description sets forth exemplary methods, parameters, and the like. It should be recognized, however, that such description is not intended as a limitation on the scope of the present disclosure but is instead provided as a description of exemplary embodiments.

Although the following description uses terms “first, ” “second, ” etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first style could be termed a second style, and, similarly, a second style could be termed a first style, without departing from the scope of the various described embodiments. The first style and the second style are both styles, but they are not the same style.

The terminology used in the description of the various described embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and the appended claims, the singular forms “a, ” “an, ” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes, ” “including, ” “comprises, ” and/or “comprising, ” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term “if” is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting, ” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event] ” or “in response to detecting [the stated condition or event] , ” depending on the context.

FIG. 1A illustrates an exemplary process 100 for generating a target-style transfer model configured to adapt content images to the target artistic style, according to some embodiments. Process 100 is performed, for example, using one or more electronic devices. In some examples, process 100 is performed using a client-server system, and the blocks of process 100 are divided up in any manner between the server and client device (s) . Thus, while portions of process 100 are described herein as being performed by an electronic device, it will be appreciated that process 100 is not so limited. In process 100, some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally, omitted. In some examples, additional steps may be performed in combination with the process 100. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.

With reference to FIG. 1, the process 100 generates the target-style transfer model in two steps. In Step 1 (labelled 120) , a system (e.g., one or more electronic devices) generates a neutral-style transfer model 106 based on a plurality of artistic styles 102. The plurality of styles 102 can include a plurality of images having the plurality of styles, respectively. The neutral-style transfer model 106 is a model that can be quickly trained further to result in a transfer model of any given artistic style (e.g., pop art style, expressionist style) .

In Step 2 (labelled 122) , the system trains the neutral-style transfer model 106 based on a target style 108 to obtain a target-style transfer model 112. The target-style transfer model 112 can receive a content image from a user and adapt the content image to the target style while preserving the original content. As an example, an expressionist transfer model can receive a content image (e.g., an image depicting a face) and adapt the content image to the expressionist style while preserving the original content (e.g., the face) . The adapted image can be provided to the user in a variety of ways, such as displayed on an electronic display, downloaded as a file, sent to and printed on a printer (e.g., paper printer, fabric printer, 3D printer, plastic printer) , added to a different file (e.g., a video animation file) , etc.

In some embodiments, the neutral-style transfer model 106 is stored on an electronic device as a part of a mobile app or a web app. A user can provide a target style image and a content image. Within a limited number of iterations, the system can train the neutral-style transfer model 106 to obtain a target-style transfer model based on the target style image. As discussed above, this can take between approximately 5-30 seconds in some embodiments. The target-style transfer model can be reused to adapt any number of content images to the target style. As discussed above, the time to adapt a content image to the target style can be well below 1 second in some embodiments.

Using the two-step process 100, a transfer model of any given artistic style can be generated quickly. In some examples, Step 2 can be completed between 5 seconds to 30 seconds. Further, the resulting model can produce a high-quality artistic style transfer. Thus, process 100 achieves a desirable balance among speed, flexibility, and quality.

FIG. 1B illustrates exemplary models generated as a result of the process 100, according to some embodiments. Post Step 1, the system obtains a neutral-style transfer model. For illustration purposes, FIG. 1B shows an exemplary neutral-style transfer model 122, which can receive a content image 130 and produce an image 132 adapted to the neutral style.

The neutral-style transfer model 122 is configured to be trained further in Step 2 to produce a final, target-style transfer model that can receive content images and adapt the content images to the target style. FIG. 1B further depicts exemplary target-style transfer models. A target-style transfer model 124 results from training the neutral-style transfer model 122 based on Style A (e.g., a style image in Style A) in Step 2. The model 124 is configured to receive a content image 130 and adapt the content image to Style A. As another example, a target-style transfer model 126 results from training the neutral-style transfer model 122 based on Style B (e.g., a style image in Style B) in Step 2. The model 126 is configured to receive a content image 130 and adapt the content image to Style B.

Returning to FIG. 1A, the process 100 is described in more detail. At block 104, the system trains a model based on a plurality of styles 102. The model can be any machine learning model. In some embodiments, the model is a neutral network. The parameters of the neural work can be randomly initialized. In some embodiments, the plurality of styles 102 includes a plurality of image sets corresponding to the plurality of styles. Each image set includes one or more style images in the corresponding artistic style. As shown, Step 1 results in the neutral-style transfer model 106.

FIG. 2A illustrates an exemplary process implementing Step 1, according to some embodiments. At block 202, the system selects a batch of styles from the plurality of styles. In some embodiments, the batch of styles comprises style images randomly selected from the plurality of styles 102.

At block 204, the system calculates a loss (also referred to as “outer loss” or “aggregated outer loss” ) corresponding to the selected batch of styles, thus obtaining the loss 210. In some embodiments, the loss is a perceptual loss. In some embodiments, the loss is indicative of the distances between the images generated by the model and the selected batch of styles (i.e., style images corresponding to the batch of styles) . In some embodiments, the loss 210 is obtained by training the model based on each style of the batch of styles, calculating a loss corresponding to each style, and aggregating (and/or averaging) the losses. An exemplary process of block 204 is described in detail with reference to FIG. 2B.

After the loss corresponding to the selected batch of styles 210 is obtained, at block 212, the system updates the model based on the loss 210. In some embodiments, the system updates the model (e.g., updating the parameters of the neural network) to minimize the loss 210. As discussed further with reference to FIG. 3B, the updating of the model can be represented algorithmically by the pseudo code below, in which θ represents parameters in the model, E represents the loss 210, and η represents the outer learning rate.

Blocks 202-212 can be repeated until a condition is met. In some embodiments, the condition is a predefined number of update iterations such that a loop terminates when this number of iterations is obtained. Thus, until the condition is met, the system continues to obtain a new batch of styles, obtain a loss corresponding to the new batch of styles, and update the model accordingly. This repeated process can be referred to as the “outer loop” of the process 120 (i.e., Step 1) . Further, the loss 210 can be referred to as the “outer loss” or “aggregated outer loss” . At the end of the process 120 (i.e., Step 1) , a neutral-style transfer model 106 is obtained.

FIG. 2B illustrates an exemplary process of block 204 for obtaining an outer loss corresponding to a batch of styles, according to some embodiments. At block 224, the system trains the model based on a particular style of the batch of styles (e.g., a style image in the particular style) . After the model is trained, at block 226, the system calculate a loss (or “outer loss” ) corresponding to the particular style. The outer loss corresponding to the particular style can be a perceptual loss. For example, as a part of block 226, the system samples a batch of content images from a content validation dataset and adapts the batch of content images using the model. In some embodiments, the outer loss can be determined based on the distance between each adapted image and the particular style (i.e., style image) and then aggregating (and/or averaging) the distances for all adapted images. In some embodiments, the distance is calculated using a perceptual loss formula, as discussed with reference to FIG. 3A.

At block 228, the system updates an aggregated loss (or aggregated outer loss) corresponding to the batch of styles based on the loss corresponding to the particular style. In some embodiments, the aggregated outer loss is incremented by the loss corresponding to the particular style. As discussed further with reference to FIG. 3B, blocks 226 and 228 can be represented algorithmically by the pseudo code below, in which Dval represents a content validation dataset, Is represents the particular style, and E represents the aggregated outer loss.

sample a batch

from

increment E by loss from I _s and

As shown in FIG. 2B, blocks 224-228 are performed for each style of the patch of styles. In each iteration, the model is trained by a particular style and a loss (or outer loss) corresponding to the particular style is calculated. At the end of the process, the loss (or outer loss) corresponding to the batch of styles 210 is obtained. In some embodiments, the loss 210 is the sum of all outer losses corresponding to all styles in the batch of styles. In some embodiments, the loss 210 is the average of all outer losses corresponding to all styles in the batch of styles.

FIG. 2C illustrates an exemplary process of block 224 for training a model based on a single style, according to some embodiments. At block 232, the system samples a batch of content images. In some embodiments, the batch of content images is sampled from a content training dataset, which is different from the content validation dataset. At block 234, the system calculates a loss corresponding to the batch of content (or “inner loss” ) based on the batch of content and the particular style. As a part of block 234, the system can adapt the batch of content images using the model. The inner loss can be determined by calculating the distance between each adapted image and the particular style (i.e., the particular style image) and aggregating (or averaging) the distances for all adapted images. In some embodiments, the distance is calculated using a perceptual loss formula, as discussed with reference to FIG. 3A.

At block 236, based on the inner loss, the system updates the model. In some embodiments, the system updates the model (e.g., updating the parameters of the neural network) to minimize the inner loss. As discussed further with reference to FIG. 3B, the updating of the model can be represented algorithmically by the pseudo code below, in which w represents parameters in the model, L represents the inner loss, and δ represents the inner learning rate.

Blocks 232-236 can be repeated until a condition is met. In each iteration, a new batch of content is sampled and the model is updated based on a loss (or inner loss) corresponding to the batch of content. This repeated process is referred to as the “inner loop” of Step 1. As discussed further with reference to FIG. 3B, this repeated process can be represented algorithmically by the pseudo code below, in which Θ represents parameters in the model, T represents the number of inner updates, and D _tr represents the content training dataset. In some embodiments, T is in the range between 1 and 5.

FIG. 3B illustrates an exemplary set of pseudo code implementing Step 1 to obtain a neutral-style transfer model, according to some embodiments. As shown, the output of the algorithm includes trained parameters Θ of the model. The algorithm trains the model to solve a bi-level optimization problem shown in FIG. 3A (also reproduced below) :

Equation 2 corresponds to training the parameters of the model M in the inner loop of the process. Θ represents the initialized parameters of the model M. w represents the trained parameters of the model M, now denoted W _s, t to indicate that w is trained based on a particular style. As discussed above with reference to FIG. 2C, the inner loop of the process trains the model such that the model is optimized (e.g., loss is minimized) with respect to individual styles.

Equation 1 corresponds to training the parameters of the model M (e.g., a neural network) in the outer loop. Equation 1 indicates that w _s, T, which are trained parameters from the inner loop (Equation 2) , are the parameters of the model M in the outer loop. As discussed above with reference to FIG. 2A, the outer loop of the process trains the model such that the model is optimized (e.g., loss is minimized) with respect to batches of styles.

M (x; y) represents the output of the model M. The input x of the model M is a content image (I _c) . y represents a set of parameters of the model M. The output of M is an adapted image (I _x) that preserves the content of I _c in a desirable style I _s.

l represents a perceptual loss indicative of the compatibility of I _x and (I _c, I _s) and can be calculated as a sum of content difference between the content image and the solution and the style difference between the style image and the solution, denoted as

As such, l can be used to calculate a loss (inner or outer) between an adapted image of a content image and the image pair (i.e., the original content image and a target style image) . As shown by

Equations

1 and 2 and FIGS. 2A-B, the calculated losses can be used to update the model in the inner loop and the outer loop.

Both the inner objective and the outer objective are designed to be the perceptual loss averaged across datasets. In some embodiments, the inner objective only optimizes contents in the training set, whereas the outer objective generalizes to contents in the validation set. The expectation of the outer objective E _c, sis taken with respect to both the styles and the content images in the validation set, whereas the expectation of the inner objective E _c is taken with respect to the content images in the training set only. With reference to the algorithm in FIG. 3B, the inner loop involves sampling a content batch from the content training dataset (D _tr) , whereas the outer loop involves additionally sampling a content batch from the content validation dataset (D _val) .

This design allows the adapted model to specialize for a single style but still maintains the initialization generalized enough. Note that for the outer objective, w _s, T implicitly depends on Θ. In essence, the framework trains an initialization M (.; Θ) that could adapt to M (.; w _s, T) efficiently and preserve high image quality for an arbitrary style.

The explicit training-validation separation in the framework forces the style transfer model to generalize to unobserved content images without over-fitting to the training set. Coupled with this separation, the system constrains the number of steps in the gradient dynamics computation to encourage quick adaptation for an arbitrary style and, at the same time, picks an image transformation network due to its efficiency and high transfer quality. These characters serve to the trade-offs among speed, flexibility, and quality.

FIG. 4 illustrates illustrates an exemplary process implementing Step 2, according to some embodiments. At block 402, the system selects a batch of content images. At block 404, the system calculates a loss based on the neutral-style transfer model and the target style 108. For example, the system first adapts the batch of content images using the neutral-style adaption model 106 to obtain a batch of adapted images. The system can then calculate the distance (e.g., perceptual loss) between an adapted image and the image pair (original content image, target style image 108) . The distances can be aggregated and/or averaged to obtain the loss.

At block 406, the system updates the model based on the loss. In some embodiments, the system updates the model to minimize the loss. This process can be represented by the equation below. Note that the model is initialized with the trained parameters from Step 1 (i.e., the neutral-style transfer model) .

The blocks 402-406 can be repeated until a condition is met. In some embodiments, the condition is a predefined number (e.g., 100) . At the end of the process 122, a target-style transfer model 112 is obtained. The model 112 can transfer the target style to any content image with high style-transfer quality in real time.

FIG. 5 depicts an exemplary neural network architecture, according to some embodiments. In some embodiments, the network architecture is an image transformation network. In some embodiments, the output of the last convolution layer is unnormalized and activated using the Sigmoid function to squash it into [0, 1] . Upsampled convolution, which first upsamples the input and then performs convolution, and reflection padding are used to avoid checkerboard effects. Further, an instance normalization layer is appended after each convolution layer, except the last. This design forces the parameters in instance normalization layers to learn from an implicit, unobserved neutral style while keeping the model size parsimonious.

In some embodiments, small-batch learning is used to approximate both the inner and outer objective. The inner objective is approximated by several batches sampled from the training dataset and computed on a single style, whereas the outer objective is approximated by a style batch, in which each style incurs a perceptual loss computed over a content batch sampled from the validation dataset.

FIG. 6 illustrates an exemplary process 600 for generating a target-style transfer model configured to adapt a given content image to the target artistic style, according to some embodiments. Process 600 is performed, for example, using one or more electronic devices. In some examples, process 600 is performed using a client-server system, and the blocks of process 600 are divided up in any manner between the server and client device (s) . Thus, while portions of process 600 are described herein as being performed by an electronic device, it will be appreciated that process 600 is not so limited. In process 600, some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally, omitted. In some examples, additional steps may be performed in combination with the process 600. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.

At block 602, a system (e.g., one or more electronic devices) updates an initial model based on a plurality of style images to obtain a neutral-style transfer model. At block 604, the system receives a first style image in a first style. At block 606, the system, based on the first style image, updates a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and output, via a display, an adapted image in the first style. At block 610, the system receives a second style image in a second style. At block 612, the system, based on the second style image, updates a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and output, via the display, an adapted image in the second style.

In some embodiments, the initial model is a neural network.

In some embodiments, updating the first instance of the neutral-style transfer model to generate the first style transfer model comprises: updating the initial model based on a first batch of style images, wherein the first batch of style images is sampled from the plurality of style images; and after updating the initial model based on the first batch of style images, updating the initial model based on a second batch of style images, wherein the second batch of style images is sampled from the plurality of style images.

In some embodiments, updating the initial model based on the first batch of style images comprises: obtaining an outer loss corresponding to the first batch of style images; and updating the initial model based on the outer loss.

In some embodiments, updating the initial model based on the outer loss comprises updating one or more parameters of the initial model according to:

wherein θ represents the one or more parameters of the initial model, E is based on the outer loss corresponding to the first batch of style images, and η represents an outer learning rate.

In some embodiments, obtaining the outer loss corresponding to the first batch of style images comprises: performing a first training, wherein the first training comprises updating the initial model based on a first style image of the first batch of style images; after the first training, calculating a first outer loss corresponding to the first style image of the first batch of style images; performing a second training, wherein the second training comprises updating the initial model based on a second style image of the first batch of style images; after the second training, calculating a second outer loss corresponding to the second style image of the first batch of style images; and calculating the outer loss corresponding to the first batch of style images based on the first outer loss and the second outer loss.

In some embodiments, the method further comprises aggregating the first outer loss and the second outer loss.

In some embodiments, the method further comprises: averaging the first outer loss and the second outer loss.

In some embodiments, calculating the first outer loss comprises: sampling a content image; after the first training, obtaining an adapted image corresponding to the sampled content image based on the initial model; and calculating a perceptual loss based on the sampled content image and the adapted image.

In some embodiments, the content image is sampled from a validation set of content images.

In some embodiments, the perceptual loss is calculated based on a content loss between the sampled content image and the adapted image and a style loss between the sampled content image and the adapted image.

In some embodiments, performing the first training comprises: sampling a first batch of content images; calculating a first inner loss corresponding to the first batch of content images; updating the initial model based on the first inner loss.

In some embodiments, performing the first training further comprises: sampling a second batch of content images; calculating a second inner loss corresponding to the second batch of content images; and updating the initial model based on the second inner loss.

In some embodiments, the first batch of content images and the second batch of content images are sampled from a training set of content images.

In some embodiments, updating the initial model based on the first inner loss comprises updating one or more parameters of the initial model according to:

wherein w represents the one or more parameters, L is based on the first inner loss, and δ represents an inner learning rate.

In some embodiments, an exemplary computer-enabled method for generating an artistic style transfer model comprises updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein updating the initial model comprises a bi-level optimization process; and based on a style image, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.

The operations described above with reference to FIG. 6 are optionally implemented by components depicted in FIG. 7. It would be clear to a person having ordinary skill in the art how other processes are implemented based on the components depicted in FIG. 7.

FIG. 7 illustrates an example of a computing device in accordance with one embodiment. Device 700 can be a host computer connected to a network. Device 700 can be a client computer or a server. As shown in FIG. 7, device 700 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server or handheld computing device (portable electronic device) such as a phone or tablet. The device can include, for example, one or more of processor 710, input device 720, output device 730, storage 740, and communication device 760. Input device 720 and output device 730 can generally correspond to those described above, and can either be connectable or integrated with the computer.

Input device 720 can be any suitable device that provides input, such as a touch screen, keyboard or keypad, mouse, or voice-recognition device. Output device 730 can be any suitable device that provides output, such as a touch screen, haptics device, or speaker.

Storage 740 can be any suitable device that provides storage, such as an electrical, magnetic or optical memory including a RAM, cache, hard drive, or removable storage disk. Communication device 760 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or device. The components of the computer can be connected in any suitable manner, such as via a physical bus or wirelessly.

Software 750, which can be stored in storage 740 and executed by processor 710, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the devices as described above) .

Software 750 can also be stored and/or transported within any non-transitory computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions. In the context of this disclosure, a computer-readable storage medium can be any medium, such as storage 740, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.

Software 750 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions. In the context of this disclosure, a transport medium can be any medium that can communicate, propagate or transport programming for use by or in connection with an instruction execution system, apparatus, or device. The transport readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic or infrared wired or wireless propagation medium.

Device 700 may be connected to a network, which can be any suitable type of interconnected communication system. The network can implement any suitable communications protocol and can be secured by any suitable security protocol. The network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.

Device 700 can implement any operating system suitable for operating on the network. Software 750 can be written in any suitable programming language, such as C, C++, Java or Python. In various embodiments, application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service, for example.

Although the disclosure and examples have been fully described with reference to the accompanying figures, it is to be noted that various changes and modifications will become apparent to those skilled in the art. Such changes and modifications are to be understood as being included within the scope of the disclosure and examples as defined by the claims.

The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the techniques and their practical applications. Others skilled in the art are thereby enabled to best utilize the techniques and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

A computer-enabled method for generating an artistic style transfer model, the method comprising:

updating an initial model based on a plurality of style images to obtain a neutral-style transfer model;

receiving a first style image in a first style from a user;

based on the first style image, updating a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and output, via a display, an adapted content image in the first style;

receiving a second style image in a second style from the user; and

based on the second style image, updating a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and output, via the display, an adapted content image in the second style.
The method according to claim 1, wherein the initial model is a neural network.
The method according to any of claims 1-2, wherein updating the first instance of the neutral-style transfer model to generate the first style transfer model comprises:

updating the initial model based on a first batch of style images, wherein the first batch of style images is sampled from the plurality of style images; and

after updating the initial model based on the first batch of style images, updating the initial model based on a second batch of style images, wherein the second batch of style images is sampled from the plurality of style images.
The method according to claim 3, wherein updating the initial model based on the first batch of style images comprises:

obtaining an outer loss corresponding to the first batch of style images; and

updating the initial model based on the outer loss.
The method according to claim 4, wherein updating the initial model based on the outer loss comprises updating one or more parameters of the initial model according to:

wherein θ represents the one or more parameters of the initial model, E is based on the outer loss corresponding to the first batch of style images, and η represents an outer learning rate.
The method according to any of claim 4-5, wherein obtaining the outer loss corresponding to the first batch of style images comprises:

performing a first training, wherein the first training comprises updating the initial model based on a first style image of the first batch of style images;

after the first training, calculating a first outer loss corresponding to the first style image of the first batch of style images;

performing a second training, wherein the second training comprises updating the initial model based on a second style image of the first batch of style images;

after the second training, calculating a second outer loss corresponding to the second style image of the first batch of style images; and

calculating the outer loss corresponding to the first batch of style images based on the first outer loss and the second outer loss.
The method according to claim 6, further comprising: aggregating the first outer loss and the second outer loss.
The method according to claim 6, further comprising: averaging the first outer loss and the second outer loss.
The method according to any of claims 6-8, wherein calculating the first outer loss comprises:

sampling a content image;

after the first training, obtaining an adapted image corresponding to the sampled content image based on the initial model; and

calculating a perceptual loss based on the sampled content image and the adapted image.
The method according to claim 9, wherein the content image is sampled from a validation set of content images.
The method according to any of claims 1-10, wherein the perceptual loss is calculated based on a content loss between the sampled content image and the adapted image and a style loss between the sampled content image and the adapted image.
The method according to any of claims 6-11, wherein performing the first training comprises:

sampling a first batch of content images;

calculating a first inner loss corresponding to the first batch of content images;

updating the initial model based on the first inner loss.
The method according to claim 12, wherein performing the first training further comprises:

sampling a second batch of content images;

calculating a second inner loss corresponding to the second batch of content images; and

updating the initial model based on the second inner loss.
The method according to any of claims 12-13, wherein the first batch of content images and the second batch of content images are sampled from a training set of content images.
The method according to claim 12, wherein updating the initial model based on the first inner loss comprises updating one or more parameters of the initial model according to:

wherein w represents the one or more parameters, L is based on the first inner loss, and δrepresents an inner learning rate.
A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to:

update an initial model based on a plurality of style images to obtain a neutral-style transfer model;

receive a first style image in a first style;

based on the first style image, update a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and output, via a display, an adapted image in the first style;

receive a second style image in a second style; and

based on the second style image, update a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and output, via the display, an adapted image in the second style.
A system, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

updating an initial model based on a plurality of style images to obtain a neutral-style transfer model;

receiving a first style image in a first style;

based on the first style image, updating a first instance of the neutral-style transfer model to generate a first style transfer model, wherein the first style transfer model is configured to receive a first content image and output, via a display, an adapted image in the first style;

receiving a second style image in a second style; and

based on the second style image, updating a second instance of the neutral-style transfer model to generate a second style transfer model, wherein the second style transfer model is configured to receive a second content image and output, via the display, an adapted image in the second style.
A computer-enabled method for generating an artistic style transfer model, the method comprising:

updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein updating the initial model comprises a bi-level optimization process; and

based on a style image, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted content image in a style of the style image.
The method according to claim 18, wherein the bi-level optimization process comprises:

updating the initial model based on an outer loss corresponding to a batch of style images from the plurality of style images.
The method according to claim 19, wherein the bi-level optimization process comprises:

updating the initial model based on an inner loss corresponding to a style image from the batch of style images.
A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to:

update an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein updating the initial model comprises a bi-level optimization process; and

based on a style image provided by a user, update an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.
A system, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

updating an initial model based on a plurality of style images to obtain a neutral-style transfer model, wherein updating the initial model comprises a bi-level optimization process; and

updating, based on a style image provided by a user, updating an instance of the neutral-style transfer model to generate a style transfer model, wherein the first style transfer model is configured to receive a content image and provide an adapted image in a style of the style image.