[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities

Published: 08 December 2022 Publication History

Abstract

We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival processes of jobs and the service processes of servers are stochastic with unknown arrival rates and service rates. Upon the completion of each job from dispatcher un at server sm, a random utility whose mean is unknown is obtained. We seek to design a control policy that makes routing decisions at the dispatchers and scheduling decisions at the servers to maximize the total utility obtained by the end of a finite time horizon T. The performance of policies is measured by regret, which is defined as the difference in total expected utility with respect to the optimal dynamic policy that has access to arrival rates, service rates and underlying utilities.
We first show that the expected utility of the optimal dynamic policy is upper bounded by T times the solution to a static linear program, where the optimization variables correspond to rates of jobs from dispatchers to servers and the feasibility region is parameterized by arrival rates and service rates. We next propose a policy for the optimal control problem that is an integration of a learning algorithm and a control policy. The learning algorithm seeks to learn the optimal extreme point solution to the static linear program based on the information available in the optimal control problem. The control policy, a mixture of priority-based and Joint-the-Shortest-Queue routing at the dispatchers and priority-based scheduling at the servers, makes decisions based on the graphical structure induced by the extreme point solutions provided by the learning algorithm. We prove that our policy achieves logarithmic regret whereas application of existing techniques to the optimal control problem would lead to Ω(√T)-regret. The theoretical analysis is further complemented with simulations to evaluate the empirical performance of our policy.

References

[1]
J. Tsitsiklis and K. Xu. "Queueing system topologies with limited flexibility." in Proceedings of the ACM SIGMETRICS, pp: 167--178. 2013.
[2]
W. Weng, X. Zhou, and R. Srikant. "Optimal load balancing in bipartite graphs." arXiv preprint arXiv:2008.08830 (2020).
[3]
S. Banerjee, Y. Kanoria, and P. Qian. "State dependent control of closed queueing networks." in ACM SIGMETRICS Performance Evaluation Review, Vol. 46, No. 1, pp: 2--4, 2018.
[4]
A. Gandhi, M. Harchol-Balter, R. Das, and C. Lefurgy. "Optimal power allocation in server farms." in ACM SIGMETRICS Performance Evaluation Review, Vol. 37, No. 1, pp: 157--168, 2009.
[5]
H. Yu, M. Neely, and X. Wei. "Online convex optimization with stochastic constraints." in Advances in Neural Information Processing Systems, 2017.
[6]
S. Shalev-Shwartz, "Online learning and online convex optimization." in Foundations and Trends in Machine Learning, Vol. 4, No. 2, pp: 107--194, 2012.
[7]
A. Agarwal, O. Dekel, and L. Xiao. "Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback." in Conference on Learning Theory, pp: 28--40, 2010.
[8]
T. Chen and G. B. Giannakis. "Bandit convex optimization for scalable and dynamic IoT management." in IEEE Internet of Things Journal, Vol. 6, No. 1, pp: 1276--1286, 2018.
[9]
A. Agarwal, D. P. Foster, D. Hsu, S. M. Kakade, and A. Rakhlin. "Stochastic convex optimization with bandit feedback." in SIAM Journal on Optimization, Vol. 23, No. 1, pp: 213--240, 2013.
[10]
J. Abernethy, E. Hazan, and A. Rakhlin. "Competing in the dark: An efficient algorithm for bandit linear optimization." 2009.
[11]
S. Bubeck and N. Cesa-Bianchi, "Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems." in Machine Learning, Vol. 5, No. 1, pp: 1--122, 2012.
[12]
J. He, D. Zhou, and Q. Gu. "Logarithmic regret for reinforcement learning with linear function approximation." in International Conference on Machine Learning, pp: 4171--4180, 2021.
[13]
L. Zheng and L. Ratliff. "Constrained upper confidence reinforcement learning." in Learning for Dynamics and Control, pp: 620--629, 2020.
[14]
I. Osband and B. Van Roy. "Near-optimal reinforcement learning in factored mdps." in Advances in Neural Information Processing Systems, 2014.
[15]
M. Azar, I. Osband, and R. Munos. "Minimax regret bounds for reinforcement learning." in International Conference on Machine Learning, pp: 263--272, 2017.
[16]
L. Yang and M. Wang. "Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound." in International Conference on Machine Learning, pp: 10746--10756, 2020.
[17]
V. Dani, T. Hayes, and S. Kakade. "Stochastic linear optimization under bandit feedback.", in Conference on Learning Theory, 2008.
[18]
R. Singh and A. Stolyar. "Maxweight scheduling: Asymptotic behavior of unscaled queue-differentials in heavy traffic." in Proceedings of ACM SIGMETRICS, pp: 431--432, 2015.
[19]
M. Neely. "Stochastic network optimization with application to communication and queueing systems." in Synthesis Lectures on Communication Networks, Vol. 3, No. 1, pp: 1--211, 2010.
[20]
X. Lin, N. B. Shroff, and R. Srikant. "A tutorial on cross-layer optimization in wireless networks." in IEEE Journal on Selected areas in Communications, Vol. 24, No. 8, pp: 1452--1463, 2006.
[21]
T. Stahlbuhk, B. Shrader, and E. Modiano. "Learning algorithms for minimizing queue length regret." in IEEE Transactions on Information Theory, Vol. 67, No. 3, pp: 1759--1781, 2021.
[22]
S. Krishnasamy, R. Sen, R. Johari, and S. Shakkottai. "Learning unknown service rates in queues: A multiarmed bandit approach." in Operations Research, Vol. 69, No. 1, pp: 315--330, 2021.
[23]
X. Fu, and E. Modiano. "Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay." in Proceedings of ACM Mobihoc, pp. 21--30. 2021.
[24]
X. Fu, and E. Modiano. "Elastic job scheduling with unknown utility functions." in Performance Evaluation, 2021.
[25]
A. Vera and S. Banerjee. "The bayesian prophet: A low-regret framework for online decision making." in Management Science, Vol. 67, No. 3, pp: 1368--1391, 2021
[26]
X. Tan, B. Sun, A. Leon-Garcia, Y. Wu, and D. Tsang. "Mechanism design for online resource allocation: A unified approach." in Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 4, No. 2, pp: 1--46, 2020.
[27]
Agrawal, Shipra, Zizhuo Wang, and Yinyu Ye. "A dynamic near-optimal algorithm for online linear programming." Operations Research 62, no. 4 (2014): 876--890.
[28]
M. Neely. "Super-fast delay tradeoffs for utility optimal fair scheduling in wireless networks." in IEEE Journal on Selected Areas in Communications, Vol. 24, No. 8, pp: 1489--1501, 2006.
[29]
D. Bertsimas and J. Tsitsiklis. "Introduction to linear optimization." Vol. 6. Belmont, MA: Athena Scientific, 1997.
[30]
V. Pesic and R. Williams. "Dynamic scheduling for parallel server systems in heavy traffic: Graphical structure, decoupled workload matrix and some sufficient conditions for solvability of the Brownian control problem." in Stochastic Systems, Vol. 6, No. 1, pp: 26--89, 2016.
[31]
J. M. Harrison and M. J. López. "Heavy traffic resource pooling in parallel-server systems." in Queueing systems, Vol. 33, No. 4, pp: 339--368, 1999.
[32]
E. Hazan. "Introduction to online convex optimization." in Foundations and Trends in Optimization, Vol. 2, No. 3--4, pp: 157--325, 2016.

Cited By

View all
  • (2024)Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop NetworksProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004138:3(1-48)Online publication date: 10-Dec-2024
  • (2024)Optimistic Joint Flow Control and Link Scheduling with Unknown Utility FunctionsProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686389(271-280)Online publication date: 14-Oct-2024
  • (2024)Heavy-Traffic Optimal Size- and State-Aware DispatchingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390358:1(1-36)Online publication date: 21-Feb-2024
  • Show More Cited By

Index Terms

  1. Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
      Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 6, Issue 3
      POMACS
      December 2022
      534 pages
      EISSN:2476-1249
      DOI:10.1145/3576048
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 December 2022
      Published in POMACS Volume 6, Issue 3

      Check for updates

      Author Tags

      1. learning for network control
      2. stochastic queueing networks

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)252
      • Downloads (Last 6 weeks)20
      Reflects downloads up to 09 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop NetworksProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004138:3(1-48)Online publication date: 10-Dec-2024
      • (2024)Optimistic Joint Flow Control and Link Scheduling with Unknown Utility FunctionsProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686389(271-280)Online publication date: 14-Oct-2024
      • (2024)Heavy-Traffic Optimal Size- and State-Aware DispatchingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390358:1(1-36)Online publication date: 21-Feb-2024
      • (2024)Backlogged Bandits: Cost-Effective Learning for Utility Maximization in Queueing NetworksIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621295(381-390)Online publication date: 20-May-2024
      • (2024)Optimal control of a queueing systemOptimization10.1080/02331934.2024.2422040(1-14)Online publication date: 30-Oct-2024
      • (2023)Reducing the Size of a Waiting Line OptimallyWSEAS TRANSACTIONS ON SYSTEMS AND CONTROL10.37394/23203.2023.18.3518(342-345)Online publication date: 27-Oct-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media