MARTI is an open-source framework for training LLM-based Multi-Agent Systems (MAS) with Reinforcement Learning (RL). It enables powerful, scalable, and adaptive workflows by combining centralized multi-agent interactions with distributed policy training. MARTI supports both built-in graph-based workflows and popular third-party multi-agent frameworks. We hope that MARTI not only advances reasoning capabilities beyond those of individual large language models or reasoning models, but also fosters collective intelligence as a step toward general artificial intelligence.
MARTI is still in a very early experimental stage, we are actively developing more powerful LLM-based multi-agent RL approaches and warmly welcome collaborations in this direction.
- [2025-05-27] We release the codebase of MARTI framework, welcome to have a try on LLM-based multi-agent reinforcement learning. π€
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference
Modern LLM applications often rely on complex task-solving strategies. While single-agent LLMs struggle with scalability and context tracking, multi-agent systems (MAS) offer a promising alternative. MARTI fills the gap between inference and training in MAS by introducing reinforcement learning for structured, collaborative agent behavior.
We designed the MARTI framework following the principle of centralized multi-agent interaction with distributed policy training, where all agent interactions and reward allocation occur centrally while policy training is distributed across individual agents. As illustrated in Figure 1, MARTI comprises three core modules: Multi-Agent World, Centralized Rewarding, and Single Agent Trainer.
Figure 1: Overview of Core Components of MARTI
Key Features:
- Multi-Agent Inference + RL Training in a unified framework
- Graph-based workflows (debate, chain-of-agents, mixture-of-agents)
- Support for heterogeneous models within the same agent graph
- Built-in credit assignment and reward shaping strategies
- Support for diverse RL algorithms (PPO, GRPO, REINFORCE++, TTRL)
- Third-party integration with AutoGen and CAMEL (experimental)
- Advanced performance on reasoning benchmarks (e.g., AIME)
Additionally, building on single-agent RL frameworks like OpenRLHF and verl, MARTI supports the vLLM v1 Engine and a Hybrid Engine to enable fast and efficient training.
git clone https://github.com/TsinghuaC3I/MARTI.git
cd MARTI
pip install -r requirements.txt
Follow the setup instructions for dependencies, including OpenRLHF, Ray, and vLLM.
MARTI supports:
- Built-in DAG-based workflows: debate, mixture-of-agents, chain-of-agents
- Third-party frameworks: AutoGen and CAMEL (Experimental)
Example:
MODLE_DIR="Path to models, like Qwen2.5-3B"
# See the script for more inference examples
bash scripts/run_test_mas.sh ${MODEL_DIR}
MARTI supports:
- Rule-based rewards (Reward Shaping)
- Generative reward models (LLM-as-Judge) (Experimental)
- Tree-based AgentPRM (ImplicitPRM) (Experimental)
- Supervised fine-tuning + RL (e.g., PPO, GRPO)
Example:
# Minimum hardware requirement for training with 3 Qwen2.5-3B agents: approximately 6Γ80G GPUs
MODLE_DIR="Path to models, like Qwen2.5-3B"
WANDB_KEY="API key of wandb"
# Train Single Agent with GRPO
bash scripts/run_train_grpo.sh ${MODEL_DIR} ${WANDB_KEY}
# Train Multi-Agent Debate with Reinforce++
bash scripts/run_train_mad.sh ${MODEL_DIR} ${WANDB_KEY}
We employ the MARTI framework to train both base and reasoning models, specifically Qwen2.5-3B
and DeepScaleR-1.5B-Preview
. For Qwen2.5-3B
, we implement DeepSeek-R1 zero-like reinforcement learning training using Level 3-5 samples from the MATH dataset. The DeepScaleR-1.5B-Preview
model, which exhibits strong inherent reasoning capabilities but presents training challenges, undergoes Test-Time Reinforcement Learning (TTRL) adaptation on AIME benchmark data. For multi-agent reinforcement learning, we employ a cluster configuration consisting of 3 nodes, each equipped with 8 A800 80GB GPUs, allocating one full node per agent.
We compare non-reasoning and reasoning models under various configurations and show that majority voting consistently outperforms multi-agent workflows when trained conventionally. This reflects known limitations of current LLM-based agent systems, such as poor role adherence and ineffective inter-agent communication.
To address this, MARTI enhances model reasoning through structured agent interactions. As shown in Figure 2 and Figure 3, our experiments show that:
- MARTI-trained base models outperform standard RL setups and rival instructed models.
- Large reasoning models trained with MARTI using TTRL achieve state-of-the-art results on challenging tasks (e.g., 66.7 AIME score with Multi-Agent Debates).
- Multi-agent RL consistently surpasses single-agent systems in performance under the same compute budget.
Figure 2: Average scores of Qwen2.5-3B base and instruct models under different budget and settings
Figure 3: Average scores of reasoning models under different budget and settings
We conduct multi-agent debate training with Qwen2.5-3B
The Qwen2.5-3B
model is trained using REINFORCE++ on Level 3 to 5 samples from the MATH-500 dataset.
Figure 4: Accuracy of MAD (Qwen2.5-3B, MATH) on AMC and MATH
Figure 5: Training Dynamics of MAD (Qwen2.5-3B, MATH)
We evaluate a mixture-of-agents approach using the Qwen2.5-3B
model, trained on Levels 3 through 5 of the MATH-500 training dataset.
Figure 6: Accuracy of MoA (Qwen2.5-3B, MATH) on AMC and MATH
Figure 7: Training Dynamics of MoA (Qwen2.5-3B, MATH)
- Release MARTI Technical Report
- Initial support for agentic tasks (e.g., GAIA benchmark)
- More features are working in progress
MARTI is developed primarily based on OpenRLHF. We would like to express our gratitude to the developers of OpenRLHF, as well as to the teams behind vLLM, Ray and DeepSpeed for their invaluable contributions.
- Project Lead: Kaiyan Zhang
- Agent Group: Runze Liu, Kaiyan Zhang, Kai Tian, Guoli Jia, Xingtai Lv, Che Jiang
- RL Group: Kaiyan Zhang, Xuekai Zhu, Sihang Zeng, Yuchen Fan, Yuxin Zuo
For the full list of contributors, please refer to the author list in the citation. We are also deeply grateful to everyone who engaged in discussions and provided valuable feedback throughout the development of this project.
For issues or inquiries:
- Kaiyan Zhang, Tsinghua University (zhang-ky22@mails.tsinghua.edu.cn)
- Biqing Qi, Shanghai AI Lab (qibiqing@pjlab.org.cn)
If you use MARTI in your research, please cite the project:
@misc{marti2025,
title={MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference},
author={Kaiyan Zhang and Runze Liu and Xuekai Zhu and Kai Tian and Sihang Zeng and Guoli Jia and Yuchen Fan and Xingtai Lv and Yuxin Zuo and Che Jiang and Ziyang Liu and Jianyu Wang and Yuru Wang and Ruotong Zhao and Ermo Hua and Yibo Wang and Shijie Wang and Junqi Gao and Xinwei Long and Youbang Sun and Zhiyuan Ma and Ganqu Cui and Lei Bai and Ning Ding and Biqing Qi and Bowen Zhou},
year={2025},
institution={Tsinghua University and Shanghai AI Lab},
url={https://github.com/TsinghuaC3I/MARTI}
}