🔥 News!!!
- [2025/04] We release the model checkpoints and inference code. [New!]
In this work, we embrace the RL paradigm and introduce ReTool, a Tool-augmented Reinforcement learning framework explicitly designed to guide LLMs towards optimal strategies for leveraging external computational tools during reasoning. Our comprehensive experiments on AIME2024 and AIME2025 demonstrate that ReTool not only achieves superior accuracy compared to conventional text-based RL approaches, but also converges with significantly fewer training steps.
🚀 ReTool achieves accuracy of 67.0% on AIME 2024 and 49.3% on AIME 2025 based on the Qwen2.5-32B-Instruct model, outperforming the text-based RL baseline with less than 50% training steps.
We provide the model weights of ReTool-Qwen-32B and ReTool-DeepSeek-R1-Distill-Qwen-32B, which are trained based on Qwen2.5-32B-Instruct and DeepSeek-R1-Distill-Qwen-32B. Note: ReTool-Qwen-32B achieves 67% on AIME 24, and ReTool-DeepSeek-R1-Distill-Qwen-32B achieves 72% on AIME 24.
pip install vllm==0.7.3
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install deepspeed
pip install accelerate
pip install datasets
pip install "git+https://github.com/tongyx361/symeval.git"
pip install timeout_decorator
To speed up open-source progress, we are currently using STILL3's inference framework to evaluate our trained checkpoints.
Quick start for model inference:
cd evaluation
bash scripts/eval.sh
We provide training and validation datasets for ReTool.
- Cold-Start: ReTool-SFT.
- RL Training: DAPO-Math-17k by DAPO, thanks for their great work!
- RL Validation: AIME 2024, AIME 2025 (can be found in
evaluation/dataset
)
We thank the verl for providing the awesome open-source RL infrastructure.
If you find our project helpful, please cite:
@misc{feng2025retoolreinforcementlearningstrategic, title={ReTool: Reinforcement Learning for Strategic Tool Use in LLMs}, author={Jiazhan Feng and Shijue Huang and Xingwei Qu and Ge Zhang and Yujia Qin and Baoquan Zhong and Chengquan Jiang and Jinxin Chi and Wanjun Zhong}, year={2025}, eprint={2504.11536}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.11536}, }