8000 GitHub - Lyn-Lucy/MSD
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Lyn-Lucy/MSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Speculative Decoding (MSD)

📄 Paper on arXiv Speculative Decoding Reimagined for Multimodal Large Language Models

🧠 MSD Models

You can directly use the Multimodal Speculative Decoding (MSD) models available on Hugging Face:


🧱 1. Setup & Installation

conda create -n msd python=3.10 -y
conda activate msd
# Ensure CUDA 12.1 is installed and configured

cd LLaVA
pip install -e .
cd ../EAGLE
pip install -e .
cd ../lmms-eval
pip install -e .

📥 2. Download Datasets

Download the annotations used for instruction tuning:

Then download the image data from the following datasets:

After downloading, organize the data under ./image_data in the following structure:

├── coco
│   └── train2017
├── gqa
│   └── images
├── ocr_vqa
│   └── images
├── textvqa
│   └── train_images
└── vg
    ├── VG_100K
    └── VG_100K_2

⚙️ 3. Data Processing

Use the following script to generate training data. You can control the target model by setting the --model_type argument (e.g., llava_v15 or qwen2_vl):

cd EAGLE/eagle/ge_data

CUDA_VISIBLE_DEVICES=0 python -m eagle.ge_data.allocation \
    --outdir <output_data_dir> \
    --model_type <model_type> \
    --model <base_model_path> \
    --image_data_path <image_data_dir> \
    --json_data_path <annotation_file>

🏋️ 4. Train the Model

Use DeepSpeed to train the speculative decoding model. Modify the following paths according to your setup:

cd EAGLE/eagle/train

deepspeed --master_port 29504 --include localhost:0 main_deepspeed.py \
    --deepspeed_config ds_config.json \
    --tmpdir_v <visual_data_path> \
    --tmpdir_t <text_data_path> \
    --basepath <base_llm_path> \
    --cpdir <checkpoint_output_dir> \
    --config <training_config_path>

Parameters:

  • <visual_data_path>: directory containing preprocessed visual data
  • <text_data_path>: directory containing preprocessed text data
  • <training_config_path>: training configuration file, e.g., llava_v15_7B_config.json

📊 5. Evaluate the Model

Run evaluation with lmms-eval. The following example evaluates on the ChartQA task:

CUDA_VISIBLE_DEVICES=0 accelerate launch --num_processes=1 --main_process_port=29506 -m lmms_eval \
    --model <model_name> \
    --model_args pretrained="<base_model_path>" \
    --msd_model_path <msd_model_path> \
    --tasks chartqa \
    --batch_size 1 \
    --gen_kwargs temperature=0 \
    --use_msd \

Parameters:

  • <model_name>: short name identifier of your model, e.g., llava_msd or qwen2_vl_msd
  • <base_model_path>: path to the base pretrained model
  • <msd_model_path>: path to the MSD model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0