(Recon) Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Yufa Zhou*†1, Shaobo Wang*2,3, Xingyu Dong*4, Xiangqi Jin2, Yifang Chen5, Yue Min2,
Kexin Yang3, Xingzhang Ren3, Dayiheng Liu3, Linfeng Zhang†2
*Equal contribution, †Corresponding authors
1Duke University, 2EPIC Lab, Shanghai Jiao Tong University, 3Qwen Team, Alibaba Group,
4University of Pennsylvania, 5The University of Chicago
Training large language models (LLMs) directly for multi-agent scenarios poses significant challenges, such as complex reward modeling, coordination difficulties, and competing objectives. In contrast, post-training techniques like Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR) have shown promise in enhancing reasoning abilities. This leads to a central question:
Can post-training techniques generalize effectively to multi-agent scenarios?
Recon addresses this by adopting economic reasoning as a structured testbed. We curate 2,100 high-quality economic reasoning problems across 15 categories and employ a two-stage post-training pipeline (Base→SFT→GRPO).
Our results demonstrate that domain-aligned post-training enables LLMs to generalize from textbook economic problems to emergent, rational behavior in multi-agent games—bridging the gap between single-agent and strategic multi-agent reasoning.
Recon follows a three-step pipeline:
- Dataset Curation & Reasoning Trace Distillation:
- Curate high-quality economic problems and reasoning traces from benchmarks and teacher models to form the Recon dataset.
- Post-Training via SFT & RL:
- Post-train a base model using Supervised Fine-Tuning (Recon-SFT) and Group Relative Policy Optimization (Recon-RL) on the curated dataset.
- Evaluation:
- Evaluate models on economic reasoning benchmarks, self-play, and multi-agent games against opponent agents.
data/
— Datasets and benchmarks for economic reasoning and multi-agent gamesimages/
— Figures and diagrams for documentation and papersSFT_Training.ipynb
— Supervised fine-tuning pipelineGRPO_Training.ipynb
— GRPO training pipelinedata_curation.ipynb
— Dataset curation and reasoning trace distillationvllm_evaluate.ipynb
— Model evaluation scripts
- Domain-aligned post-training works: Structured economic reasoning tasks lead to strong generalization, enabling models to excel in both single-agent and multi-agent settings.
- Strategic behavior emerges: Recon models not only solve textbook problems but also display rational, game-theoretic behavior in unseen interactive games.
- SFT + GRPO is effective: The two-stage pipeline yields clear gains in both accuracy and strategic alignment, validating this approach for aligning LLMs with economic and multi-agent objectives.
- Scalable and interpretable alignment: Recon's approach provides a scalable route to agent alignment, with richer, self-correcting reasoning chains that enhance interpretability and facilitate auditing.
If you find Recon helpful for your research, please cite our work:
@article{zhou2025recon,
title={Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs},
author={Zhou, Yufa and Wang, Shaobo and Dong, Xingyu and Jin, Xiangqi and Chen, Yifang and Min, Yue and Yang, Kexin and Ren, Xingzhang and Liu, Dayiheng and Zhang, Linfeng},
journal={arXiv preprint arXiv:2506.00577},
year={2025},
url={https://arxiv.org/abs/2506.00577}
}
For questions or collaborations, feel free to reach out:
📧 yufa.zhou[at]duke.edu