DuoDecoding

This repo contains the implementation for the paper Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting. We propose deploying the draft model on CPU, which shifts drafting computational overhead to CPU and enables parallel decoding.

Setup

Create a conda environment with Python 3.10:

conda create -n duodec python=3.10
conda activate duodec

Install Python bindings for llama.cpp:

CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

Install other required packages:

git clone https://github.com/KaiLv69/DuoDecoding.git
cd DuoDecoding
pip install -r requirements.txt

Set model path in src/utils.py.
(Optional) Install draftretriever and create a datastore for REST:

bash src/model/rest/datastore/datastore.sh
pip install src/model/rest/DraftRetriever/wheels/draftretriever-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl

Evaluation

We provide evaluation scripts for the experiments reported in our paper.

To evaluate the baseline methods on Llama-2-7b:

bash cmds/baseline_llama.sh

To evaluate DuoDecoding on Llama-2-7b:

bash cmds/duodec_llama.sh

To evaluate baseline methods on Vicuna-7b-v1.5:

bash cmds/baseline_vicuna.sh

To evaluate DuoDecoding on Vicuna-7b-v1.5:

bash cmds/duodec_vicuna.sh

Bugs and Questions

If you have any questions related to the code or the paper, feel free to email Kai (klv23@m.fudan.edu.cn). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Acknowledgments

This repo builds upon the following excellent repos: llama-cpp-python, Spec-Bench, parallelspeculativedecoding.

Citation

Please cite our paper if you find the repo helpful:

@misc{lv2025duodecodinghardwareawareheterogeneousspeculative,
      title={DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting}, 
      author={Kai Lv and Honglin Guo and Qipeng Guo and Xipeng Qiu},
      year={2025},
      eprint={2503.00784},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.00784}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cmds		cmds
data		data
eval		eval
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DuoDecoding

Setup

Evaluation

Bugs and Questions

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Languages

License

KaiLv69/DuoDecoding

Folders and files

Latest commit

History

Repository files navigation

DuoDecoding

Setup

Evaluation

Bugs and Questions

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages