Re: Full-Rank Training for Sana 2k model on Human Image Dataset #120

gitanon112 · 2024-12-27T23:35:07Z

Hello,

Thank you for your great work with SANA, its incredible. I just had a couple of questions:

1)If my goal is to make SANA 2k model(which currently struggles with human image generation) capable/better at generating human images, is it as simple as full-rank training on dataset with high-quality human image-text pairs? Or is it more complex than this?
2) Am I correct in believing that https://github.com/NVlabs/Sana/blob/main/train_scripts/train.py and https://github.com/NVlabs/Sana/blob/main/train_scripts/train.sh are the full-rank training scripts for SANA, while https://github.com/NVlabs/Sana/blob/main/train_scripts/train_dreambooth_lora_sana.py and https://github.com/NVlabs/Sana/blob/main/train_scripts/train_lora.sh are for dreambooth lora training?

SimpleTuner has a guide for full-rank training Sana model(https://github.com/bghira/SimpleTuner/blob/main/documentation/quickstart/SANA.md). Should we use these over the scripts you have provided. Will training be faster/"better" if we use SimpleTuner approach?
When training the 2K model, does this mean the dataset/images we train on should be of 2k resolution only? Sorry if it's a dumb question

Thanks so much for the help, really appreciate all your hardwork :)

shaun-ba · 2024-12-28T07:48:53Z

There is no way a model should even exist if it cannot general humans. If this isn't fixed in the 4k model and all future revisions of the lower res models then nobody will use it. In my opinion they will fix this, they just need time.

nitinmukesh · 2024-12-28T08:18:21Z

It is already mentioned in the roadmap, with 1.5 version we will have better human image generation
https://github.com/NVlabs/Sana#to-do-list

Regarding # 4, it is answered here
#112

lawrence-cj · 2025-01-02T13:32:34Z

1)If my goal is to make SANA 2k model(which currently struggles with human image generation) capable/better at generating human images, is it as simple as full-rank training on dataset with high-quality human image-text pairs? Or is it more complex than this?
2) Am I correct in believing that https://github.com/NVlabs/Sana/blob/main/train_scripts/train.py and https://github.com/NVlabs/Sana/blob/main/train_scripts/train.sh are the full-rank training scripts for SANA, while https://github.com/NVlabs/Sana/blob/main/train_scripts/train_dreambooth_lora_sana.py and https://github.com/NVlabs/Sana/blob/main/train_scripts/train_lora.sh are for dreambooth lora training?

Correct. The training of 2K is totally the same as other resolution. The only difference would be needing more GPU memory which is obvious.

lawrence-cj · 2025-01-02T13:33:26Z

SimpleTuner has a guide for full-rank training Sana model(https://github.com/bghira/SimpleTuner/blob/main/documentation/quickstart/SANA.md). Should we use these over the scripts you have provided. Will training be faster/"better" if we use SimpleTuner approach?

I haven't tested the training script from SimpleTuner personally. No comment about the performance.

lawrence-cj · 2025-01-02T13:33:59Z

When training the 2K model, does this mean the dataset/images we train on should be of 2k resolution only? Sorry if it's a dumb question

Correct. Better to use 2K images for better performance.

gitanon112 · 2025-01-04T03:27:34Z

Thank you!!

lawrence-cj added the Answered Answered the question label Jan 2, 2025

gitanon112 closed this as completed Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re: Full-Rank Training for Sana 2k model on Human Image Dataset #120

Re: Full-Rank Training for Sana 2k model on Human Image Dataset #120

Re: Full-Rank Training for Sana 2k model on Human Image Dataset #120

Re: Full-Rank Training for Sana 2k model on Human Image Dataset #120

Comments