8000 v0.2.0 by mkshing · Pull Request #5 · mkshing/svdiff-pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

v0.2.0 #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of serv 8000 ice and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 12, 2023
Merged

v0.2.0 #5

merged 4 commits into from
Apr 12, 2023

Conversation

mkshing
Copy link
Owner
@mkshing mkshing commented Apr 12, 2023

What's changed

Released v0.2.0

Improved the following parts based on the author @phymhan's feedback (#3)!

  • Train spectral shifts for 1-D weights such as LayerNorm too. (file size: 935kB (before: 923kB))
  • Using different learning rate for 1-D weights via --learning_rate_1d
  • Additionally, train spectral shifts of text encoder by --train_text_encoder (file size: 1.17MB)

By this change, you get better results with less training steps than the first release v0.1.1!!

sample example

accelerate launch svdiff-pytorch-2/train_svdiff.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"\
  --instance_data_dir=$INSTANCE_DATA_DIR \
  --class_data_dir=$CLASS_DATA_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="photo of sks woman" \
  --class_prompt="photo of a woman" \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-3 \
  --learning_rate_1d=1e-6 \
  --train_text_encoder \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --checkpointing_steps=200 \
  --max_train_steps=1000 \
  --use_8bit_adam \
  --enable_xformers_memory_efficient_attention \
  --seed=42 \
  --gradient_checkpointing

"portrait of sks woman wearing kimono" where sks indicates Gal Gadot.
image

Added Single Image Editing

sample script
training

accelerate launch train_svdiff.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"  \
  --instance_data_dir="pink-chair-dir" \
  --output_dir="output-dir" \
  --instance_prompt="photo of a pink chair with black legs" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-3 \
  --learning_rate_1d=1e-6 \
  --train_text_encoder \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=500 \
  --use_8bit_adam \
  --enable_xformers_memory_efficient_attention \
  --seed=42 \
  --gradient_checkpointing 

inference

import sys
import torch
from PIL import Image
from diffusers import DDIMScheduler
sys.path.append("/content/svdiff-pytorch-2")
from svdiff_pytorch import load_unet_for_svdiff, load_text_encoder_for_svdiff, StableDiffusionPipelineWithDDIMInversion

pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
spectral_shifts_ckpt_dir = "/content/SIE/checkpoint-500"
image = "pink-chair.jpeg"
source_prompt = "photo of a pink chair with black legs"
target_prompt = "photo of a blue chair with black legs"

unet = load_unet_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt=spectral_shifts_ckpt_dir, subfolder="unet")
text_encoder = load_text_encoder_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt=spectral_shifts_ckpt_dir, subfolder="text_encoder")
# load pipe
pipe = StableDiffusionPipelineWithDDIMInversion.from_pretrained(
    pretrained_model_name_or_path,
    unet=unet,
    text_encoder=text_encoder,
)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")

# in this example, i didn't use ddim inversion 
inv_latents = None
# (optional) ddim inversion
# image = Image.open(image).convert("RGB").resize((512, 512))
# in SVDiff, they use guidance scale=1 in ddim inversion
# inv_latents = pipe.invert(source_prompt, image=image, guidance_scale=1.0).latents
image = pipe(target_prompt, latents=inv_latents).images[0]

image"photo of a pink blue chair with black legs"

* the input image was taken from https://unsplash.com/photos/1JJJIHh7-Mk

TODO

  • Add SIE result
  • Update colab notebook
  • Update gradio a8ed9fa

@mkshing mkshing added the enhancement New feature or request label Apr 12, 2023
@mkshing mkshing linked an issue Apr 12, 2023 that may be closed by this pull request
@mkshing mkshing linked an issue Apr 12, 2023 that may be closed by this pull request
@mkshing mkshing marked this pull request as ready for review April 12, 2023 13:37
@mkshing mkshing merged commit 9199552 into 6CFF main Apr 12, 2023
@mkshing mkshing deleted the v0.2.0 branch April 12, 2023 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

edit a real picture
1 participant
0