More Web Proxy on the site http://driver.im/

research-article

Open access

Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities

Authors:

Eytan ModianoAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 6, Issue 3

Article No.: 58, Pages 1 - 32

https://doi.org/10.1145/3570619

Published: 08 December 2022 Publication History

Abstract

We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival processes of jobs and the service processes of servers are stochastic with unknown arrival rates and service rates. Upon the completion of each job from dispatcher u_n at server s_m, a random utility whose mean is unknown is obtained. We seek to design a control policy that makes routing decisions at the dispatchers and scheduling decisions at the servers to maximize the total utility obtained by the end of a finite time horizon T. The performance of policies is measured by regret, which is defined as the difference in total expected utility with respect to the optimal dynamic policy that has access to arrival rates, service rates and underlying utilities.

We first show that the expected utility of the optimal dynamic policy is upper bounded by T times the solution to a static linear program, where the optimization variables correspond to rates of jobs from dispatchers to servers and the feasibility region is parameterized by arrival rates and service rates. We next propose a policy for the optimal control problem that is an integration of a learning algorithm and a control policy. The learning algorithm seeks to learn the optimal extreme point solution to the static linear program based on the information available in the optimal control problem. The control policy, a mixture of priority-based and Joint-the-Shortest-Queue routing at the dispatchers and priority-based scheduling at the servers, makes decisions based on the graphical structure induced by the extreme point solutions provided by the learning algorithm. We prove that our policy achieves logarithmic regret whereas application of existing techniques to the optimal control problem would lead to Ω(√T)-regret. The theoretical analysis is further complemented with simulations to evaluate the empirical performance of our policy.

References

[1]

J. Tsitsiklis and K. Xu. "Queueing system topologies with limited flexibility." in Proceedings of the ACM SIGMETRICS, pp: 167--178. 2013.

[2]

W. Weng, X. Zhou, and R. Srikant. "Optimal load balancing in bipartite graphs." arXiv preprint arXiv:2008.08830 (2020).

[3]

S. Banerjee, Y. Kanoria, and P. Qian. "State dependent control of closed queueing networks." in ACM SIGMETRICS Performance Evaluation Review, Vol. 46, No. 1, pp: 2--4, 2018.

[4]

A. Gandhi, M. Harchol-Balter, R. Das, and C. Lefurgy. "Optimal power allocation in server farms." in ACM SIGMETRICS Performance Evaluation Review, Vol. 37, No. 1, pp: 157--168, 2009.

[5]

H. Yu, M. Neely, and X. Wei. "Online convex optimization with stochastic constraints." in Advances in Neural Information Processing Systems, 2017.

[6]

S. Shalev-Shwartz, "Online learning and online convex optimization." in Foundations and Trends in Machine Learning, Vol. 4, No. 2, pp: 107--194, 2012.

[7]

A. Agarwal, O. Dekel, and L. Xiao. "Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback." in Conference on Learning Theory, pp: 28--40, 2010.

[8]

T. Chen and G. B. Giannakis. "Bandit convex optimization for scalable and dynamic IoT management." in IEEE Internet of Things Journal, Vol. 6, No. 1, pp: 1276--1286, 2018.

[9]

A. Agarwal, D. P. Foster, D. Hsu, S. M. Kakade, and A. Rakhlin. "Stochastic convex optimization with bandit feedback." in SIAM Journal on Optimization, Vol. 23, No. 1, pp: 213--240, 2013.

Digital Library

[10]

J. Abernethy, E. Hazan, and A. Rakhlin. "Competing in the dark: An efficient algorithm for bandit linear optimization." 2009.

[11]

S. Bubeck and N. Cesa-Bianchi, "Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems." in Machine Learning, Vol. 5, No. 1, pp: 1--122, 2012.

[12]

J. He, D. Zhou, and Q. Gu. "Logarithmic regret for reinforcement learning with linear function approximation." in International Conference on Machine Learning, pp: 4171--4180, 2021.

[13]

L. Zheng and L. Ratliff. "Constrained upper confidence reinforcement learning." in Learning for Dynamics and Control, pp: 620--629, 2020.

[14]

I. Osband and B. Van Roy. "Near-optimal reinforcement learning in factored mdps." in Advances in Neural Information Processing Systems, 2014.

[15]

M. Azar, I. Osband, and R. Munos. "Minimax regret bounds for reinforcement learning." in International Conference on Machine Learning, pp: 263--272, 2017.

[16]

L. Yang and M. Wang. "Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound." in International Conference on Machine Learning, pp: 10746--10756, 2020.

[17]

V. Dani, T. Hayes, and S. Kakade. "Stochastic linear optimization under bandit feedback.", in Conference on Learning Theory, 2008.

[18]

R. Singh and A. Stolyar. "Maxweight scheduling: Asymptotic behavior of unscaled queue-differentials in heavy traffic." in Proceedings of ACM SIGMETRICS, pp: 431--432, 2015.

[19]

M. Neely. "Stochastic network optimization with application to communication and queueing systems." in Synthesis Lectures on Communication Networks, Vol. 3, No. 1, pp: 1--211, 2010.

[20]

X. Lin, N. B. Shroff, and R. Srikant. "A tutorial on cross-layer optimization in wireless networks." in IEEE Journal on Selected areas in Communications, Vol. 24, No. 8, pp: 1452--1463, 2006.

Digital Library

[21]

T. Stahlbuhk, B. Shrader, and E. Modiano. "Learning algorithms for minimizing queue length regret." in IEEE Transactions on Information Theory, Vol. 67, No. 3, pp: 1759--1781, 2021.

Digital Library

[22]

S. Krishnasamy, R. Sen, R. Johari, and S. Shakkottai. "Learning unknown service rates in queues: A multiarmed bandit approach." in Operations Research, Vol. 69, No. 1, pp: 315--330, 2021.

[23]

X. Fu, and E. Modiano. "Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay." in Proceedings of ACM Mobihoc, pp. 21--30. 2021.

[24]

X. Fu, and E. Modiano. "Elastic job scheduling with unknown utility functions." in Performance Evaluation, 2021.

[25]

A. Vera and S. Banerjee. "The bayesian prophet: A low-regret framework for online decision making." in Management Science, Vol. 67, No. 3, pp: 1368--1391, 2021

[26]

X. Tan, B. Sun, A. Leon-Garcia, Y. Wu, and D. Tsang. "Mechanism design for online resource allocation: A unified approach." in Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 4, No. 2, pp: 1--46, 2020.

[27]

Agrawal, Shipra, Zizhuo Wang, and Yinyu Ye. "A dynamic near-optimal algorithm for online linear programming." Operations Research 62, no. 4 (2014): 876--890.

Digital Library

[28]

M. Neely. "Super-fast delay tradeoffs for utility optimal fair scheduling in wireless networks." in IEEE Journal on Selected Areas in Communications, Vol. 24, No. 8, pp: 1489--1501, 2006.

[29]

D. Bertsimas and J. Tsitsiklis. "Introduction to linear optimization." Vol. 6. Belmont, MA: Athena Scientific, 1997.

Digital Library

[30]

V. Pesic and R. Williams. "Dynamic scheduling for parallel server systems in heavy traffic: Graphical structure, decoupled workload matrix and some sufficient conditions for solvability of the Brownian control problem." in Stochastic Systems, Vol. 6, No. 1, pp: 26--89, 2016.

[31]

J. M. Harrison and M. J. López. "Heavy traffic resource pooling in parallel-server systems." in Queueing systems, Vol. 33, No. 4, pp: 339--368, 1999.

[32]

E. Hazan. "Introduction to online convex optimization." in Foundations and Trends in Optimization, Vol. 2, No. 3--4, pp: 157--325, 2016.

Cited By

Dai YHuang L(2024)Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop NetworksProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004138:3(1-48)Online publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1145/3700413
Liu XWei HYing LPapavassiliou SSchmid S(2024)Optimistic Joint Flow Control and Link Scheduling with Unknown Utility FunctionsProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686389(271-280)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3641512.3686389
Xie RGrosof IScully Z(2024)Heavy-Traffic Optimal Size- and State-Aware DispatchingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390358:1(1-36)Online publication date: 21-Feb-2024
https://dl.acm.org/doi/10.1145/3639035
Show More Cited By

Index Terms

Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities
1. Networks
  1. Network algorithms
2. Theory of computation
  1. Theory and algorithms for application domains

Recommendations

Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities
SIGMETRICS '23

We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival ...
Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival ...
Stochastic Control and Stability for Queueing Networks in Random Environments

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 6, Issue 3

POMACS

December 2022

534 pages

EISSN:2476-1249

DOI:10.1145/3576048

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California, United States
,
Zhi-Li Zhang
University of Minnesota, United States

Issue’s Table of Contents

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2022

Published in POMACS Volume 6, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
587
Total Downloads

Downloads (Last 12 months)217
Downloads (Last 6 weeks)16

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dai YHuang L(2024)Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop NetworksProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004138:3(1-48)Online publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1145/3700413
Liu XWei HYing LPapavassiliou SSchmid S(2024)Optimistic Joint Flow Control and Link Scheduling with Unknown Utility FunctionsProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686389(271-280)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3641512.3686389
Xie RGrosof IScully Z(2024)Heavy-Traffic Optimal Size- and State-Aware DispatchingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390358:1(1-36)Online publication date: 21-Feb-2024
https://dl.acm.org/doi/10.1145/3639035
Steiger JLi BLu N(2024)Backlogged Bandits: Cost-Effective Learning for Utility Maximization in Queueing NetworksIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621295(381-390)Online publication date: 20-May-2024
https://doi.org/10.1109/INFOCOM52122.2024.10621295
Lefebvre MYaghoubi R(2024)Optimal control of a queueing systemOptimization10.1080/02331934.2024.2422040(1-14)Online publication date: 30-Oct-2024
https://doi.org/10.1080/02331934.2024.2422040
Lefebvre M(2023)Reducing the Size of a Waiting Line OptimallyWSEAS TRANSACTIONS ON SYSTEMS AND CONTROL10.37394/23203.2023.18.3518(342-345)Online publication date: 27-Oct-2023
https://doi.org/10.37394/23203.2023.18.35

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents