More Web Proxy on the site http://driver.im/

research-article

Improving multi-agent reinforcement learning with stable prefix policy

AUTHORs:

Yin ZhangAuthors Info & Claims

IJCAI '24: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Article No.: 6, Pages 49 - 57

https://doi.org/10.24963/ijcai.2024/6

Published: 03 August 2024 Publication History

Abstract

In multi-agent reinforcement learning (MARL), the ε-greedy method plays an important role in balancing exploration and exploitation during the decision-making process in value-based algorithms. However, the ε-greedy exploration process will introduce conserativeness when calculating the expected state value when the agents are more in need of exploitation during the approximate policy convergence, which may result in a suboptimal policy convergence. Besides, eliminating the ε-greedy algorithm leaves no exploration and may lead to unacceptable local optimal policies. To address this dilemma, we use the previously collected trajectories to construct a Monte-Carlo Trajectory Tree, so that an existing optimal template, a sequence of state prototypes, can be planned out. The agents start by following the planned template and act according to the policy without exploration, Stable Prefix Policy. The agents will adaptively dropout and begin to explore by following the ε-greedy method when the policy still needs exploration. We scale our approach to various value-based MARL methods and empirically verify our method in a cooperative MARL task, SMAC benchmarks. Experimental results demonstrate that our method achieves not only better performance but also faster convergence speed than baseline algorithms within early time steps.

References

[1]

Alekh Agarwal, Sham M Kakade, Jason D Lee, and Gaurav Mahajan. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431-4506, 2021.

Digital Library

[2]

Yuxuan Bu, Xuechen Chen, T. D. Kulkarni, Amir Saeedi, and Joshua B. Tenenbaum. Individual-global-maximum: A new framework for multi-agent reinforcement learning. arXiv preprint arXiv:2006.07689, 2020.

[3]

Yongcan Cao, Wenwu Yu, Wei Ren, and Guanrong Chen. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial informatics, 9(1):427-438, 2012.

[4]

Jakob Foerster, Guillaume Assael, Michael White, Timothy P. Lillicrap, and John D. Schulman. Learning to communicate with deep multi-agent reinforcement learning. arXiv preprint arXiv:1706.03762, 2017.

[5]

Jakob Foerster, Guillaume Assael, Michael White, Timothy P. Lillicrap, and John D. Schulman. Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1802.01561, 2018.

[6]

Jian Hu, Siyang Jiang, Seth Austin Harding, Haibin Wu, and Shih wei Liao. Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. 2021.

[7]

Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, and Michael I Jordan. Is q-learning provably efficient? Advances in neural information processing systems, 31, 2018.

[8]

Sven Koenig and Reid G Simmons. Complexity analysis of real-time reinforcement learning. In AAAI, volume 93, pages 99-105, 1993.

[9]

Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.

[10]

Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, and Shimon Whiteson. Maven: Multi-agent variational exploration. Advances in Neural Information Processing Systems, 32, 2019.

[11]

Frans A Oliehoek, Christopher Amato, et al. A concise introduction to decentralized POMDPs, volume 1. Springer, 2016.

[12]

Yi Ouyang, Mukul Gagrani, Ashutosh Nayyar, and Rahul Jain. Learning unknown markov decision processes: A thompson sampling approach. Advances in neural information processing systems, 30, 2017.

[13]

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, pages 4292-4301, 2018.

[14]

Tabish Rashid, Gregory Farquhar, Bei Peng, and Shimon Whiteson. Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33:10199-10210, 2020.

[15]

Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philiph H. S. Torr, Jakob Foerster, and Shimon Whiteson. The StarCraft Multi-Agent Challenge. CoRR, abs/1902.04043, 2019.

[16]

Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning, pages 5887-5896. PMLR, 2019.

[17]

Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.

[18]

R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 1998.

Digital Library

[19]

Ikechukwu Uchendu, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, et al. Jump-start reinforcement learning. In International Conference on Machine Learning, pages 34556-34583. PMLR, 2023.

[20]

Tonghan Wang, Jianhao Wang, Yi Wu, and Chongjie Zhang. Influence-based multi-agent exploration. arXiv preprint arXiv:1910.05512, 2019.

[21]

Tonghan Wang, Jianhao Wang, Chongyi Zheng, and Chongjie Zhang. Learning nearly decomposable value functions via communication minimization. arXiv preprint arXiv:1910.05366, 2019.

[22]

Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, and Chongjie Zhang. Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062, 2020.

[23]

Tonghan Wang, Heng Dong, Victor Lesser, and Chongjie Zhang. Roma: Multi-agent reinforcement learning with emergent roles. arXiv preprint arXiv:2003.08039, 2020.

[24]

Jiangxing Wang, Deheng Ye, and Zongqing Lu. More centralized training, still decentralized execution: Multi-agent conditional policy factorization. In International Conference on Learning Representations (ICLR), 2023.

[25]

Tengyang Xie, Dylan J Foster, Yu Bai, Nan Jiang, and Sham M Kakade. The role of coverage in online reinforcement learning. arXiv preprint arXiv:2210.04157, 2022.

[26]

Yaodong Yang, Ying Wen, Jun Wang, Liheng Chen, Kun Shao, David Mguni, and Weinan Zhang. Multi-agent determinantal q-learning. In International Conference on Machine Learning, pages 10757-10766. PMLR, 2020.

[27]

Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, and Yang Gao. Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476-25488, 2021.

[28]

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611-24624, 2022.

[29]

Chongjie Zhang and Victor Lesser. Coordinated multi-agent reinforcement learning in networked distributed pomdps. In Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011.

Digital Library

[30]

Tianhao Zhang, Yueheng Li, Chen Wang, Guangming Xie, and Zongqing Lu. Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International Conference on Machine Learning, pages 12491-12500. PMLR, 2021.

Index Terms

Improving multi-agent reinforcement learning with stable prefix policy
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Intelligent agents
      2. Multi-agent systems
    2. Search methodologies
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques

Index terms have been assigned to the content through auto-classification.

Recommendations

Mediated Multi-Agent Reinforcement Learning
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

The majority of Multi-Agent Reinforcement Learning (MARL) literature equates the cooperation of self-interested agents in mixed environments to the problem of social welfare maximization, allowing agents to arbitrarily share rewards and private ...
Multi-agent Reinforcement Learning for Decentralized Stable Matching
Algorithmic Decision Theory
Abstract
In the real world, people/entities usually find matches independently and autonomously, such as finding jobs, partners, roommates, etc. It is possible that this search for matches starts with no initial knowledge of the environment. We propose the ...
Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent's policy can easily get ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI '24: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

August 2024

8859 pages

ISBN:978-1-956792-04-1

Editor:
Kate Larson

Copyright © 2024 International Joint Conferences on Artificial Intelligence.

Sponsors

International Joint Conferences on Artifical Intelligence (IJCAI)

Publisher

Unknown publishers

Publication History

Published: 03 August 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents