This repo contains the implementation for the paper Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting. We propose deploying the draft model on CPU, which shifts drafting computational overhead to CPU and enables parallel decoding.
- Create a conda environment with Python 3.10:
conda create -n duodec python=3.10
conda activate duodec
- Install Python bindings for llama.cpp:
CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
- Install other required packages:
git clone https://github.com/KaiLv69/DuoDecoding.git
cd DuoDecoding
pip install -r requirements.txt
-
Set model path in
src/utils.py
. -
(Optional) Install draftretriever and create a datastore for REST:
bash src/model/rest/datastore/datastore.sh
pip install src/model/rest/DraftRetriever/wheels/draftretriever-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl
We provide evaluation scripts for the experiments reported in our paper.
- To evaluate the baseline methods on Llama-2-7b:
bash cmds/baseline_llama.sh
- To evaluate DuoDecoding on Llama-2-7b:
bash cmds/duodec_llama.sh
- To evaluate baseline methods on Vicuna-7b-v1.5:
bash cmds/baseline_vicuna.sh
- To evaluate DuoDecoding on Vicuna-7b-v1.5:
bash cmds/duodec_vicuna.sh
If you have any questions related to the code or the paper, feel free to email Kai (klv23@m.fudan.edu.cn). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
This repo builds upon the following excellent repos: llama-cpp-python, Spec-Bench, parallelspeculativedecoding.
Please cite our paper if you find the repo helpful:
@misc{lv2025duodecodinghardwareawareheterogeneousspeculative,
title={DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting},
author={Kai Lv and Honglin Guo and Qipeng Guo and Xipeng Qiu},
year={2025},
eprint={2503.00784},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.00784},
}