Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Meissonic is a non-autoregressive mask image modeling text-to-image synthesis model that can generate high-resolution images. It is designed to run on consumer graphics cards.
Note: This is a project under development. If you encounter any specific performance issues or find significant discrepancies with the results reported in the paper, please submit an issue on the GitHub repository! Thank you for your support!
pip install accelerate pytorch-lightning torch torchvision tqdm transformers diffusers numpy gradio --extra-index-url https://download.pytorch.org.whl/cu124
git clone https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .
python inference.py
python inpaint.py --mode inpaint
python inpaint.py --mode outpaint
Prompt: "A pillow with a picture of a Husky on it."
Prompt: "A white coffee mug, a solid black background"
If you find this work helpful, please consider citing:
@article{bai2024meissonic,
title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
journal={arXiv preprint arXiv:2410.08261},
year={2024}
}