[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3535850.3535949acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article
Public Access

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Published: 09 May 2022 Publication History

Abstract

We propose Streaming Bandits, a Restless Multi-Armed Bandit (RMAB) framework in which heterogeneous arms may arrive and leave the system after staying on for a finite lifetime. Streaming Bandits naturally capture the health-intervention planning problem, where health workers must manage the health outcomes of a patient cohort while new patients join and existing patients leave the cohort each day. Our contributions are as follows: (1) We derive conditions under which our problem satisfies indexability, a pre-condition that guarantees the existence and asymptotic optimality of the Whittle Index solution for RMABs. We establish the conditions using a polytime reduction of the Streaming Bandit setup to regular RMABs. (2) We further prove a phenomenon that we call index decay - whereby the Whittle index values are low for short residual lifetimes - driving the intuition underpinning our algorithm. (3) We propose a novel and efficient algorithm to compute the index-based solution for Streaming Bandits. Unlike previous methods, our algorithm does not rely on solving the costly finite horizon problem on each arm of the RMAB, thereby lowering the computational complexity compared to existing methods. (4) Finally, we evaluate our approach via simulations run on real-world data sets from a tuberculosis patient monitoring task and an intervention planning task for improving maternal healthcare, in addition to other synthetic domains. Across the board, our algorithm achieves a 2-orders-of-magnitude speed-up over existing methods while maintaining the same solution quality. The full paper is available at: https://arxiv.org/pdf/2103.04730.pdf

Supplementary Material

ZIP File (fp717aux.pdf.zip)
This is the full paper (with Appendix) pdf

References

[1]
N. Akbarzadeh and A. Mahajan. 2019. Restless bandits with controlled restarts: Indexability and computation of Whittle index. In 2019 IEEE Conference on Decision and Control. IEEE.
[2]
Biswarup Bhattacharya. 2018. Restless bandits visiting villages: A preliminary study on distributing public health services. In Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies. 1--8.
[3]
Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, and Milind Tambe. 2021. Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare. In Proceedings of the 30th International Joint Conference on Artificial Intelligence .
[4]
Arpita Biswas, Shweta Jain, Debmalya Mandal, and Y Narahari. 2015. A truthful budget feasible multi-armed bandit mechanism for crowdsourcing time critical tasks. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. 1101--1109.
[5]
J Nell Brownstein, Farah M Chowdhury, Susan L Norris, Tanya Horsley, Leonard Jack Jr, Xuanping Zhang, and Dawn Satterfield. 2007. Effectiveness of community health workers in the care of people with hypertension. American journal of preventive medicine, Vol. 32, 5 (2007), 435--447.
[6]
Alicia H Chang, Andrea Polesky, and Gulshan Bhatia. 2013. House calls by community health workers and public health nurses to improve adherence to isoniazid monotherapy for latent tuberculosis infection: a retrospective study. BMC public health, Vol. 13, 1 (2013), 894.
[7]
K.D. Glazebrook, D. Ruiz-Hernandex, and C. Kirkbride. 2006. Some indexable families of restless bandit problems. Adv. Appl. Probab (2006), 643--672.
[8]
Jeffrey Thomas Hawkins. 2003. A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. Ph.D. Dissertation. Massachusetts Institute of Technology.
[9]
Y. Hsu. 2018. Age of information: Whittle index for scheduling stochastic arrivals. In 2018 IEEE International Symposium on Information Theory. IEEE.
[10]
Weici Hu and Peter Frazier. 2017. An asymptotically optimal index policy for finite-horizon restless bandits. arXiv preprint arXiv:1707.00205 (2017).
[11]
S. Jiang, Z. Song, O. Weinstein, and H. Zhang. 2020. Faster dynamic matrix inverse for faster lps. arXiv preprint arXiv:2004.07470 (2020).
[12]
Varun Kanade, H Brendan McMahan, and Brent Bryan. 2009. Sleeping experts and bandits with stochastic action availability and adversarial rewards. In Artificial Intelligence and Statistics. PMLR, 272--279.
[13]
J. A. Killian, B. Wilder, A. Sharma, V. Choudhary, B. Dilkina, and M. Tambe. 2019. Learning to Prescribe Interventions for Tuberculosis Patients using Digital Adherence Data. In KDD .
[14]
Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. 2010. Regret bounds for sleeping experts and bandits. Machine learning, Vol. 80, 2 (2010), 245--272.
[15]
Jerome Le Ny, Munther Dahleh, and Eric Feron. 2008. Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. In 2008 American Control Conference. IEEE, 4220--4225.
[16]
Elliot Lee, Mariel S Lavieri, and Michael Volk. 2019. Optimal screening for hepatocellular carcinoma: A restless bandit model. Manufacturing & Service Operations Management, Vol. 21, 1 (2019), 198--212.
[17]
John DC Little. 1961. A proof for the queuing formula: L= lambda W. Operations research, Vol. 9, 3 (1961), 383--387.
[18]
K. Liu and Q. Zhao. 2010a. Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Transactions on Information Theory, Vol. 56, 11 (2010), 5547--5567.
[19]
K. Liu and Q. Zhao. 2010b. Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Transactions on Information Theory (2010), 5547--5567.
[20]
Bernd Löwe, Jürgen Unützer, Christopher M Callahan, Anthony J Perkins, and Kurt Kroenke. 2004. Monitoring depression treatment outcomes with the patient health questionnaire-9. Medical care (2004), 1194--1201.
[21]
Aditya Mate, Jackson A Killian, Haifeng Xu, Andrew Perrault, and Milind Tambe. 2020. Collapsing Bandits and Their Application to Public Health Interventions. In Advances in Neural and Information Processing Systems (NeurIPS) .
[22]
Aditya Mate, Lovish Madaan, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, and Milind Tambe. 2021 a. Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health. arXiv preprint arXiv:2109.08075 (2021).
[23]
Aditya Mate, Andrew Perrault, and Milind Tambe. 2021 b. Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits. In International Conference on Autonomous Agents and Multiagent Systems .
[24]
Nicolas Meuleau, Milos Hauskrecht, Kee-Eung Kim, Leonid Peshkin, Leslie Pack Kaelbling, Thomas L Dean, and Craig Boutilier. 1998. Solving very large weakly coupled Markov decision processes. In AAAI/IAAI. 165--172.
[25]
Christopher Mundorf, Arti Shankar, Tracy Moran, Sherry Heller, Anna Hassan, Emily Harville, and Maureen Lichtveld. 2018. Reducing the risk of postpartum depression in a low-income community through a community health worker intervention. Maternal and child health journal, Vol. 22, 4 (2018), 520--528.
[26]
Patrick M Newman, Molly F Franke, Jafet Arrieta, Hector Carrasco, Patrick Elliott, Hugo Flores, Alexandra Friedman, Sophia Graham, Luis Martinez, Lindsay Palazuelos, et al. 2018. Community health workers improve disease control and medication adherence among patients with diabetes and/or hypertension in Chiapas, Mexico: an observational stepped-wedge study. BMJ Global Health (2018).
[27]
José Nino-Mora. 2011. Computing a classic index for finite-horizon bandits. INFORMS Journal on Computing, Vol. 23, 2 (2011), 254--267.
[28]
Christos H Papadimitriou and John N Tsitsiklis. 1994. The complexity of optimal queueing network control. In Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory. IEEE, 318--322.
[29]
Y. Qian, C. Zhang, B. Krishnamachari, and M. Tambe. 2016. Restless poachers: Handling exploration-exploitation tradeoffs in security domains. In AAMAS .
[30]
Jane Rahedi Ong'ang'o, Christina Mwachari, Hillary Kipruto, and Simon Karanja. 2014. The effects on tuberculosis treatment adherence from utilising community health workers: a comparison of selected rural and urban settings in Kenya. PLoS One, Vol. 9, 2 (2014), e88937.
[31]
B. Sombabu, A. Mate, D. Manjunath, and S. Moharir. 2020. Whittle index for AoI-aware scheduling. In IEEE International Conference on Communication Systems & Networks (COMSNETS). IEEE.
[32]
P. Whittle. 1988. Restless bandits: Activity allocation in a changing world. J. Appl. Probab., Vol. 25, A (1988), 287--298.
[33]
Gabriel Zayas-Caban, Stefanus Jasin, and Guihua Wang. 2019. An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits. Advances in Applied Probability, Vol. 51, 3 (2019), 745--772.

Cited By

View all
  • (2024)Towards Zero Shot Learning in Restless Multi-armed BanditsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663246(2618-2620)Online publication date: 6-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems
May 2022
1990 pages
ISBN:9781450392136

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 09 May 2022

Check for updates

Author Tags

  1. finite horizon
  2. intervention planning
  3. restless multi-armed bandits
  4. whittle index

Qualifiers

  • Research-article

Funding Sources

  • Army Research Office

Conference

AAMAS ' 22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)5
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Zero Shot Learning in Restless Multi-armed BanditsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663246(2618-2620)Online publication date: 6-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media