ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

ByteDance Seed

🔥 News!!!

[2025/04] We release the model checkpoints and inference code. [New!]

In this work, we embrace the RL paradigm and introduce ReTool, a Tool-augmented Reinforcement learning framework explicitly designed to guide LLMs towards optimal strategies for leveraging external computational tools during reasoning. Our comprehensive experiments on AIME2024 and AIME2025 demonstrate that ReTool not only achieves superior accuracy compared to conventional text-based RL approaches, but also converges with significantly fewer training steps.

🚀 ReTool achieves accuracy of 67.0% on AIME 2024 and 49.3% on AIME 2025 based on the Qwen2.5-32B-Instruct model, outperforming the text-based RL baseline with less than 50% training steps.

Model Use

We provide the model weights of ReTool-Qwen-32B and ReTool-DeepSeek-R1-Distill-Qwen-32B, which are trained based on Qwen2.5-32B-Instruct and DeepSeek-R1-Distill-Qwen-32B. Note: ReTool-Qwen-32B achieves 67% on AIME 24, and ReTool-DeepSeek-R1-Distill-Qwen-32B achieves 72% on AIME 24.

Environment Setup

pip install vllm==0.7.3
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install deepspeed
pip install accelerate
pip install datasets
pip install "git+https://github.com/tongyx361/symeval.git"
pip install timeout_decorator

Inference

To speed up open-source progress, we are currently using STILL3's inference framework to evaluate our trained checkpoints.

Quick start for model inference:

cd evaluation
bash scripts/eval.sh

Datasets

We provide training and validation datasets for ReTool.

Cold-Start: ReTool-SFT.
RL Training: DAPO-Math-17k by DAPO, thanks for their great work!
RL Validation: AIME 2024, AIME 2025 (can be found in evaluation/dataset)

Acknowledgement

We thank the verl for providing the awesome open-source RL infrastructure.

Citation

If you find our project helpful, please cite:

@misc{feng2025retoolreinforcementlearningstrategic,
title={ReTool: Reinforcement Learning for Strategic Tool Use in LLMs}, 
author={Jiazhan Feng and Shijue Huang and Xingwei Qu and Ge Zhang and Yujia Qin and Baoquan Zhong and Chengquan Jiang and Jinxin Chi and Wanjun Zhong},
year={2025},
eprint={2504.11536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.11536}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
evaluation		evaluation
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

ByteDance Seed

Model Use

Environment Setup

Inference

Datasets

Acknowledgement

Citation

About

Uh oh!

Contributors 4

Languages

License

ReTool-RL/ReTool

Folders and files

Latest commit

History

Repository files navigation

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

ByteDance Seed

Model Use

Environment Setup

Inference

Datasets

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 4

Languages