UniLLM

This repo contains pre-trained model weights and training/sampling PyTorch(torch>=2.1.0) codes used in

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, Zehuan Yuan
HKU, ByteDance

You can find more visualizations on

🔥 Update

[2024.10.23] Code are preparing !

Huggingface download: https://huggingface.co/Qwen/Qwen2.5-1.5B

🚀 Text-conditional image generation

VQ-VAE models

Method	params	tokens	data	weight
vq_ds16_t2i	72M	16x16	LAION COCO (50M) + internal data (10M)	vq_ds16_t2i.pt

AR models

Method	params	tokens	data	weight
LlamaGen-XL	775M	16x16	LAION COCO (50M)	t2i_XL_stage1_256.pt
LlamaGen-XL	775M	32x32	internal data (10M)	t2i_XL_stage2_512.pt

Demo

Before running demo, please refer to language readme to install the required packages and language models.

Please download models, put them in the folder ./pretrained_models, and run

python3 autoregressive/sample/sample_t2i.py --vq-ckpt ./pretrained_models/vq_ds16_t2i.pt --gpt-ckpt ./pretrained_models/t2i_XL_stage1_256.pt --gpt-model GPT-XL --image-size 256
# or
python3 autoregressive/sample/sample_t2i.py --vq-ckpt ./pretrained_models/vq_ds16_t2i.pt --gpt-ckpt ./pretrained_models/t2i_XL_stage2_512.pt --gpt-model GPT-XL --image-size 512

The generated images will be saved to sample_t2i.png.

Local Gradio Demo

⚡ Serving

We use serving framework vLLM to enable higher throughput. Please refer to serving readme to install the required packages.

python3 autoregressive/serve/sample_c2i.py --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_XXL_384.pt --gpt-model GPT-XXL --from-fsdp --image-size 384

The generated images will be saved to sample_c2i_vllm.png.

Getting Started

See Getting Started for installation, training and evaluation.

License

The majority of this project is licensed under MIT License. Portions of the project are available under separate license of referred projects, detailed in corresponding files.

BibTeX

@article{sun2024autoregressive,
  title={Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation},
  author={Sun, Peize and Jiang, Yi and Chen, Shoufa and Zhang, Shilong and Peng, Bingyue and Luo, Ping and Yuan, Zehuan},
  journal={arXiv preprint arXiv:2406.06525},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
autoregressive		autoregressive
dataset		dataset
evaluations		evaluations
language		language
sample_results_new		sample_results_new
scripts		scripts
tools		tools
utils		utils
.gitignore		.gitignore
000089143.jpg		000089143.jpg
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
test_dataset.py		test_dataset.py
test_qwen2.py		test_qwen2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniLLM

🔥 Update

🚀 Text-conditional image generation

VQ-VAE models

AR models

Demo

Local Gradio Demo

⚡ Serving

Getting Started

License

BibTeX

About

Releases

Packages

Contributors 2

Languages

License

BinZhu-ece/UniLLM

Folders and files

Latest commit

History

Repository files navigation

UniLLM

🔥 Update

🚀 Text-conditional image generation

VQ-VAE models

AR models

Demo

Local Gradio Demo

⚡ Serving

Getting Started

License

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages