More Web Proxy on the site http://driver.im/

article

The Complexity of Decentralized Control of Markov Decision Processes

Authors:

Daniel S. Bernstein,

Shlomo ZilbersteinAuthors Info & Claims

Mathematics of Operations Research, Volume 27, Issue 4

Pages 819 - 840

https://doi.org/10.1287/moor.27.4.819.297

Published: 01 November 2002 Publication History

Abstract

We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fully observable case and the partially observable case that allow for decentralized control are described. For even two agents, the finite-horizon problems corresponding to both of these models are hard for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we considerprovably do not admit polynomial-time algorithms. Furthermore, assuming EXP ? NEXP, the problems require superexponential time to solve in the worst case.

References

[1]

Altman, E. 2001. Applications of Markov decision processes in telecommunications: A survey. E. Feinberg, A. Shwartz, ed. Markov Decision Processes: Models, Methods, Directions, and Open Problems. Kluwer, New York.

[2]

Babai, L, L. Fortnow, C. Lund. 1991. Non-deterministic exponential time has two-prover interactive protocols. Comput. Complexity1 3-40.

[3]

Blondel, V. D., J. N. Tsitsiklis. 2000. A survey of computational complexity results in systems and control. Automatica36(9) 1249-1274.

[4]

Cassandra, A., M. L. Littman, N. L. Zhang. 1997. Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes. Proc. Thirteenth Ann. Conf. on Uncertainty in Artificial Intelligence, Providence, RI, 54-61.

[5]

Coradeschi, S., L. Karlsson, P. Stone, T. Balch, G. Kraetzschmar, M. Asada. 2000. Overview of RoboCup-99. AI Magazine21(3) 11-18.

[6]

Hansen, E. 1998. Solving POMDPs by searching in policy space. In Proc. Fourteenth Ann. Conf. on Uncertainty in Artificial Intelligence, Madison, WI, 211-219.

[7]

Hsu, K. S. I. Marcus. 1982. Decentralized control of finite state Markov processes. IEEE Trans. Auto. ControlAC-27(2) 426--431.

[8]

Jaakkola. T., S. P. Singh, M. I. Jordan. 1995. Reinforcement learning algorithm for partially observable Markov decision problems. Proc. Adv. in Neural Inform. Processing Systems7 345-352.

[9]

Kaelbling, L. P., M. L. Littman, A. R. Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artificial Intelligence101(1-2), 99-134.

[10]

Lewis, H. 1978. Complexity of solvable cases of the decision problem for predicate calculus. Proc. Nineteenth Sympos. on the Foundations of Comput. Sci., Ann Arbor, MI, 35-47.

[11]

Lusena. C., J. Goldsmith. T. Li, S. Sittinger, C. Wells. 1999. My brain is full: When more memory helps. Proc. Fifteenth Conf. on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 374-381.

[12]

Madani, O., S. Hanks, A. Condon. 1999. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. Proc. Sixteenth National Conf. on Artificial Intelligence, Orlando, FL, 541-548.

[13]

Meuleau, N. K-E. Kim, L. Kaelbling, A. R. Cassandra. 1999. Solving POMDPs by searching the space of finite policies. Proc. Fifteenth Conf. on Uncertainty in Artificial Intelligence. Stockholm, Sweden. 417-426

[14]

Mundhenk, M., J. Goldsmith, C. Lusena. E. Allender. 2000. Complexity of finite-horizon Markov decision process problems. J. ACM47(4) 681-720.

[15]

Ooi, J. M., G. W. Wornell. 1996. Decentralized control of a multiple access broadcast channel: Performance bounds. In Proc. 35th Conf. on Decision and Control. Kobe, Japan, 293-298.

[16]

Papadimitriou, C. H. 1994. Computational Complexity. Addison-Wesley, Reading, MA.

[17]

Papadimitriou, C. H., J. Tsitsiklis. 1982. On the complexity of designing distributed protocols. Inform. Control53 211-218.

[18]

Papadimitriou, C. H., J. Tsitsiklis. 1986. Intractable problems in control theory. SIAM J. Control Optim.24(4) 639-654.

Digital Library

[19]

Papadimitriou, C. H., J. Tsitsiklis. 1987. The complexity of Markov decision processes. Math. Oper. Res12(3) 441--450.

[20]

Peshkin, L., K.-E. Kim, N. Meuleau, L. P. Kaelbling. 2000. Learning to cooperate via policy search. Proc. Sixteenth Internat. Conf. on Uncertainty in Artificial Intelligence, Stanford. CA. 489-496.

[21]

Peterson, G. L. J. R. Reif. 1979. Multiple-person alternation. 20th Ann. Sympos. on Foundations of Comput. Sci., San Juan, PR. 348-363.

[22]

Puterman, M. L. 1994. Markov Decision Processes. J. Wiley & Sons, New York.

[23]

Schneider, J., W.-K. Wong, A. Moore, M. Riedmiller. 1999. Distributed value functions. Proc. Sixteenth Internat. Conf. on Machine Learning, Bled. Slovenia, 371-378.

[24]

Zhang, W. 2001. Algorithms for partially observable markov decision processes. Ph.D. thesis. Hong Kong University of Science and Technology, Kowloon, Hong Kong.

Cited By

Sun JKou JHou WBai Y(2025)A multi-agent curiosity reward model for task-oriented dialogue systemsPattern Recognition10.1016/j.patcog.2024.110884157:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.patcog.2024.110884
Miura SZilberstein SDastani MSichman JAlechina NDignum V(2024)Observer-Aware Planning with Implicit and Explicit CommunicationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663000(1409-1417)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663000
Wu YXiang YBaker TTong EZhu YCui XZhang ZHan ZLiu JNiu W(2024)Collaborative Attack Sequence Generation Model Based on Multiagent Reinforcement Learning for Intelligent Traffic Signal SystemInternational Journal of Intelligent Systems10.1155/2024/47340302024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/4734030
Show More Cited By

Index Terms

The Complexity of Decentralized Control of Markov Decision Processes
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations
      1. Markov networks
    2. Stochastic processes
      1. Markov processes
2. Theory of computation

Recommendations

The complexity of decentralized control of Markov decision processes
UAI'00: Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence

Planning for distributed agents with partial state information is considered from a decisiontheoretic perspective. We describe generalizations of both the MDP and POMDP models that allow for decentralized control. For even a small number of agents, the ...
The Complexity of Markov Decision Processes

We investigate the complexity of the classical problem of optimal policy computation in Markov decision processes. All three variants of the problem finite horizon, infinite horizon discounted, and infinite horizon average cost were known to be solvable ...
The complexity of Boolean formula minimization

The Minimum Equivalent Expression problem is a natural optimization problem in the second level of the Polynomial-Time Hierarchy. It has long been conjectured to be @S"2^P-complete and indeed appears as an open problem in Garey and Johnson (1979) [5]. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Mathematics of Operations Research

Mathematics of Operations Research Volume 27, Issue 4

November 2002

205 pages

ISSN:0364-765X

Issue’s Table of Contents

Publisher

INFORMS

Linthicum, MD, United States

Publication History

Published: 01 November 2002

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

329
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun JKou JHou WBai Y(2025)A multi-agent curiosity reward model for task-oriented dialogue systemsPattern Recognition10.1016/j.patcog.2024.110884157:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.patcog.2024.110884
Miura SZilberstein SDastani MSichman JAlechina NDignum V(2024)Observer-Aware Planning with Implicit and Explicit CommunicationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663000(1409-1417)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663000
Wu YXiang YBaker TTong EZhu YCui XZhang ZHan ZLiu JNiu W(2024)Collaborative Attack Sequence Generation Model Based on Multiagent Reinforcement Learning for Intelligent Traffic Signal SystemInternational Journal of Intelligent Systems10.1155/2024/47340302024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/4734030
Nguyen HVo BVo BRezatofighi HRanasinghe D(2024)Multi-Objective Multi-Agent Planning for Discovering and Tracking Multiple Mobile ObjectsIEEE Transactions on Signal Processing10.1109/TSP.2024.342375572(3669-3685)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TSP.2024.3423755
Yan YShen Y(2024)Distributed Policy Gradient for Linear Quadratic Networked Control With Limited Communication RangeIEEE Transactions on Signal Processing10.1109/TSP.2024.338639672(2087-2100)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1109/TSP.2024.3386396
Fujii KTakeishi NKawahara YTakeda K(2024)Decentralized policy learning with partial observation and mechanical constraints for multiperson modelingNeural Networks10.1016/j.neunet.2023.11.068171:C(40-52)Online publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.neunet.2023.11.068
Cui TYang XJia FJin JYe YBai R(2024)Mobile robot sequential decision making using a deep reinforcement learning hyper-heuristic approachExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124959257:COnline publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.124959
Niu YYan XWang YNiu Y(2024)An improved sand cat swarm optimization for moving target search by UAVExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122189238:PEOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122189
Haupt AChristoffersen PDamani MHadfield-Menell D(2024)Formal contracts mitigate social dilemmas in multi-agent reinforcement learningAutonomous Agents and Multi-Agent Systems10.1007/s10458-024-09682-538:2Online publication date: 18-Oct-2024
https://dl.acm.org/doi/10.1007/s10458-024-09682-5
Fang ZZhao JYang MLu ZZhou WLi H(2024)Coordinate-aligned multi-camera collaboration for active multi-object trackingMultimedia Systems10.1007/s00530-024-01420-x30:4Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1007/s00530-024-01420-x
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents