8000 GitHub - dasjoms/BagelUI: A rework of the gradio WebUI for the open-source unified multimodal model by ByteDance
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

dasjoms/BagelUI

 
 

Repository files navigation

BAGEL

BAGEL Website BAGEL Paper on arXiv BAGEL Model BAGEL Demo BAGEL Model BAGEL Discord BAGEL Email

BAGEL Gradio UI Fork with Extended Features

This is a fork of the official BAGEL project's Gradio WebUI, incorporating several quality-of-life improvements and features.

Based on the original work by Chaorui Deng* et al.

Latest Update (June 03, 2025)

This update adds support for dfloat11 compressed BAGEL models and enhances model management, flexibility and inference speed within the BagelUI:

  • DFloat11 Compressed Model Support:
    • Integrated full support for loading and running DFloat11 compressed version of BAGEL model.
  • Dynamic Model Loading & Switching:
    • Introduced new ⚙️ Models tab, allowing dynamic loading and switching between different BAGEL model checkpoints and quantizations.
  • Inference Optimizations:
    • Made modifications to reduce memory overhead and speed up operations by disabling gradient tracking.

Special thanks to this repo for the original inference implementation of the DFloat11 model: https://github.com/LeanModels/Bagel-DFloat11/

The BagelUI-Colab.ipynb Jupyter Notebook has also been updated.

✨ Added Features

This fork builds upon the original BAGEL Gradio UI by adding the following functionalities:

  • Structured Image Saving: Automatically saves all generated and edited images to a configurable output directory (output/ by default) with a clear folder structure based on the tab and mode used (Text-to-Image, Image Edit Standard, Image Edit Task Breakdown projects, X/Y Plot runs).
  • Batch Image Generation & Editing: Use the Batch Size slider in the Text-to-Image and Image Edit tabs to generate multiple images sequentially with varying seeds (or a fixed seed if specified).
  • LLM-Powered Task Breakdown for Editing (Experimental):
    • An experimental mode in the Image Edit tab (Enable Task Breakdown) that leverages the built-in Qwen2 LLM to break down a complex editing prompt into sequential sub-steps.
    • These sub-steps are applied one after another to the image.
  • X/Y Plotting:
    • A dedicated X/Y Plot menu in the Text to Image and Image Edit tabs.
    • Allows selecting up to two hyperparameters (X and Y axes) and providing comma-separated values for each.
    • Generates an image for every combination of the selected parameter values.
    • Includes Prompt S/R (Search/Replace) parameter for axes, akin to the same feature in Automatic1111's Stable Diffusion webui. Search for a string in the prompt and replace it with something else (separated by commas).
    • Assembles the generated images into a single grid with axis labels indicating the parameter values used for each row/column.
  • Batch Image Understanding/Captioning:
    • Adds an Input Mode button to the Image Understanding tab for switching between single image and batch processing.
    • The Batch (Files/ZIP) mode accepts multiple image files or a single ZIP file containing images.
    • Processes each image file in the batch sequentially, saving the understanding result for each into a corresponding .txt file.
    • Provides a downloadable ZIP file containing all processed images and their generated .txt files.

⚙️ Local Installation

(A Jupyter Notebook 'BagelUI-colab.ipynb' for easy cloud-use is also provided, L4 GPU is enough to run DFloat11)

  1. Clone this fork:

    git clone https://github.com/dasjoms/BagelUI.git
    cd BagelUI
  2. Set up environment:

    conda create -n bagel python=3.11.12 -y
    conda activate bagel
    pip install -r requirements.txt
    pip install flash_attn==2.5.8 --no-build-isolation

    2.5 Manually install dfloat11 package

      pip install dfloat11
  3. Download pretrained checkpoint and/or DFloat11 compressed model:

    ### Regular BAGEL-7B Model
    
    from huggingface_hub import snapshot_download
    
    save_dir = "models/BAGEL-7B-MoT"
    repo_id = "ByteDance-Seed/BAGEL-7B-MoT"
    cache_dir = save_dir + "/cache"
    
    snapshot_download(cache_dir=cache_dir,
     local_dir=save_dir,
     repo_id=repo_id,
     local_dir_use_symlinks=False,
     resume_download=True,
     allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
    )
    ### DFloat11 Model - Allows for 24GB VRAM Single GPU inference without quality loss
    
    from huggingface_hub import snapshot_download
    
    save_dir = "models/BAGEL-7B-MoT-DF11"
    repo_id = "DFloat11/BAGEL-7B-MoT-DF11"
    cache_dir = save_dir + "/cache"
    
    snapshot_download(cache_dir=cache_dir,
     local_dir=save_dir,
     repo_id=repo_id,
     local_dir_use_symlinks=False,
     resume_download=True,
     allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt", "*.model", "vae/*"],
    )

▶️ Usage

Run the modified Gradio WebUI script:

# For 32GB+ VRAM GPU or multi GPUs. Saves output to ./output/
python app.py

# To specify a different output directory
python app.py --output_dir /path/to/your/output

# For 12~32GB VRAM GPU/NF4 quantization and Chinese UI
python app.py --mode 2 --zh

# Different Requirements apply for using the DFloat11 Model

❤️ Based on the Original BAGEL Project

This work is based on the amazing BAGEL project. Please refer to the original repository for core model details, training guidelines, and benchmarks.

✍️ Citation

@article{deng2025bagel,
  title   = {Emerging Properties in Unified Multimodal Pretraining},
  author  = {Deng, Chaorui and Zhu, Deyao and Li, Kunchang and Gou, Chenhui and Li, Feng and Wang, Zeyu and Zhong, Shu and Yu, Weihao and Nie, Xiaonan and Song, Ziang and Shi, Guang and Fan, Haoqi},
  journal = {arXiv preprint arXiv:2505.14683},
  year    = {2025}
}

📜 License

BAGEL is licensed under the Apache 2.0.

Website | Paper | Model on Hugging Face | Official Demo | Official Hugging Face Space | Discord | Email

About

A rework of the gradio WebUI for the open-source unified multimodal model by ByteDance

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 53.1%
  • Jupyter Notebook 46.2%
  • Shell 0.7%
0