v0.2.0 #5

mkshing · 2023-04-12T07:57:12Z

What's changed

Released v0.2.0

Improved the following parts based on the author @phymhan's feedback (#3)!

Train spectral shifts for 1-D weights such as LayerNorm too. (file size: 935kB (before: 923kB))
Using different learning rate for 1-D weights via --learning_rate_1d
Additionally, train spectral shifts of text encoder by --train_text_encoder (file size: 1.17MB)

By this change, you get better results with less training steps than the first release v0.1.1!!

sample example

accelerate launch svdiff-pytorch-2/train_svdiff.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"\
  --instance_data_dir=$INSTANCE_DATA_DIR \
  --class_data_dir=$CLASS_DATA_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="photo of sks woman" \
  --class_prompt="photo of a woman" \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-3 \
  --learning_rate_1d=1e-6 \
  --train_text_encoder \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --checkpointing_steps=200 \
  --max_train_steps=1000 \
  --use_8bit_adam \
  --enable_xformers_memory_efficient_attention \
  --seed=42 \
  --gradient_checkpointing

"portrait of sks woman wearing kimono" where sks indicates Gal Gadot.

Added Single Image Editing

sample script
training

accelerate launch train_svdiff.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"  \
  --instance_data_dir="pink-chair-dir" \
  --output_dir="output-dir" \
  --instance_prompt="photo of a pink chair with black legs" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-3 \
  --learning_rate_1d=1e-6 \
  --train_text_encoder \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=500 \
  --use_8bit_adam \
  --enable_xformers_memory_efficient_attention \
  --seed=42 \
  --gradient_checkpointing

inference

import sys
import torch
from PIL import Image
from diffusers import DDIMScheduler
sys.path.append("/content/svdiff-pytorch-2")
from svdiff_pytorch import load_unet_for_svdiff, load_text_encoder_for_svdiff, StableDiffusionPipelineWithDDIMInversion

pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
spectral_shifts_ckpt_dir = "/content/SIE/checkpoint-500"
image = "pink-chair.jpeg"
source_prompt = "photo of a pink chair with black legs"
target_prompt = "photo of a blue chair with black legs"

unet = load_unet_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt=spectral_shifts_ckpt_dir, subfolder="unet")
text_encoder = load_text_encoder_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt=spectral_shifts_ckpt_dir, subfolder="text_encoder")
# load pipe
pipe = StableDiffusionPipelineWithDDIMInversion.from_pretrained(
    pretrained_model_name_or_path,
    unet=unet,
    text_encoder=text_encoder,
)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")

# in this example, i didn't use ddim inversion 
inv_latents = None
# (optional) ddim inversion
# image = Image.open(image).convert("RGB").resize((512, 512))
# in SVDiff, they use guidance scale=1 in ddim inversion
# inv_latents = pipe.invert(source_prompt, image=image, guidance_scale=1.0).latents
image = pipe(target_prompt, latents=inv_latents).images[0]

"photo of a ~~pink~~ blue chair with black legs"

* the input image was taken from https://unsplash.com/photos/1JJJIHh7-Mk

TODO

Add SIE result
Update colab notebook
Update gradio a8ed9fa

mkshing added 2 commits April 12, 2023 16:52

first commit of v0.2.0

4edf103

fix readme

1d55ee1

mkshing added the enhancement New feature or request label Apr 12, 2023

mkshing linked an issue Apr 12, 2023 that may be closed by this pull request

Thanks for your reimplementation #3

Closed

mkshing removed a link to an issue Apr 12, 2023

Thanks for your reimplementation #3

Closed

mkshing linked an issue Apr 12, 2023 that may be closed by this pull request

edit a real picture #4

Closed

mkshing added 2 commits April 12, 2023 22:36

minor fix

e1c57fe

fix colab

c645374

mkshing marked this pull request as ready for review April 12, 2023 13:37

mkshing merged commit 9199552 into main Apr 12, 2023

mkshing deleted the v0.2.0 branch April 12, 2023 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.2.0 #5

v0.2.0 #5

Uh oh!

Uh oh!

Uh oh!

v0.2.0 #5

v0.2.0 #5

Uh oh!

Conversation

Uh oh!

What's changed

Released v0.2.0

Added Single Image Editing

TODO

Uh oh!

Uh oh!