More Web Proxy on the site http://driver.im/

research-article

Fairness and welfare quantification for regret in multi-armed bandits

AUTHORs:

Siddharth Barman,

Ayush SawarniAuthors Info & Claims

AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

Article No.: 760, Pages 6762 - 6769

https://doi.org/10.1609/aaai.v37i6.25829

Published: 07 February 2023 Publication History

Abstract

We extend the notion of regret with a welfarist perspective. Focussing on the classic multi-armed bandit (MAB) framework, the current work quantifies the performance of bandit algorithms by applying a fundamental welfare function, namely the Nash social welfare (NSW) function. This corresponds to equating algorithm's performance to the geometric mean of its expected rewards and leads us to the study of Nash regret, defined as the difference between the—a priori unknown—optimal mean (among the arms) and the algorithm's performance. Since NSW is known to satisfy fairness axioms, our approach complements the utilitarian considerations of average (cumulative) regret, wherein the algorithm is evaluated via the arithmetic mean of its expected rewards.

This work develops an algorithm that, given the horizon of play T, achieves a Nash regret of $O \left(\sqrt{\frac{k \log T}{T}} \right)$, here k denotes the number of arms in the MAB instance. Since, for any algorithm, the Nash regret is at least as much as its average regret (the AM-GM inequality), the known lower bound on average regret holds for Nash regret as well. Therefore, our Nash regret guarantee is essentially tight. In addition, we develop an anytime algorithm with a Nash regret guarantee of $O \left(\sqrt{\frac{k\log T}{T}} \log T \right)$.

References

[1]

Auer, P.; Cesa-Bianchi, N.; Freund, Y.; and Schapire, R. E. 2002. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1): 48-77.

[2]

Barman, S.; Khan, A.; Maiti, A.; and Sawarni, A. 2022. Fairness and Welfare Quantification for Regret in Multi-Armed Bandits.

[3]

Bistritz, I.; Baharav, T.; Leshem, A.; and Bambos, N. 2020. My fair bandit: Distributed learning of max-min fairness with multi-player bandits. In International Conference on Machine Learning, 930-940. PMLR.

[4]

Caragiannis, I.; Kurokawa, D.; Moulin, H.; Procaccia, A. D.; Shah, N.; and Wang, J. 2019. The unreasonable fairness of maximum Nash welfare. ACM Transactions on Economics and Computation (TEAC), 7(3): 1-32.

Digital Library

[5]

Celis, L. E.; Kapoor, S.; Salehi, F.; and Vishnoi, N. 2019. Controlling polarization in personalization: An algorithmic framework. In Proceedings of the conference on fairness, accountability, and transparency, 160-169.

[6]

Eisenberg, E.; and Gale, D. 1959. Consensus of subjective probabilities: The pari-mutuel method. The Annals of Mathematical Statistics, 30(1): 165-168.

[7]

Hossain, S.; Micha, E.; and Shah, N. 2021. Fair Algorithms for Multi-Agent Multi-Armed Bandits. Advances in Neural Information Processing Systems, 34.

[8]

Joseph, M.; Kearns, M.; Morgenstern, J. H.; and Roth, A. 2016. Fairness in learning: Classic and contextual bandits. Advances in neural information processing systems, 29.

[9]

Kaneko, M.; and Nakamura, K. 1979. The Nash social welfare function. Econometrica: Journal of the Econometric Society, 423-435.

[10]

Lattimore, F.; Lattimore, T.; and Reid, M. D. 2016. Causal bandits: Learning good interventions via causal inference. Advances in Neural Information Processing Systems, 29.

[11]

Lattimore, T.; and Szepesvári, C. 2020. Bandit algorithms. Cambridge University Press.

[12]

Moulin, H. 2004. Fair division and collective welfare. MIT press.

[13]

Nash Jr, J. F. 1950. The bargaining problem. Econometrica: Journal of the econometric society, 155-162.

[14]

Patil, V.; Ghalme, G.; Nair, V.; and Narahari, Y. 2020. Achieving fairness in the stochastic multi-armed bandit problem. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 5379-5386.

[15]

Peterson, M. 2017. An introduction to decision theory. Cambridge University Press.

[16]

Schwartz, E. M.; Bradlow, E. T.; and Fader, P. S. 2017. Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4): 500-522.

Digital Library

[17]

Slivkins, A. 2019. Introduction to Multi-Armed Bandits. Foundations and Trends® in Machine Learning, 12(1-2): 1286.

[18]

Sutton, R. S.; and Barto, A. G. 2018. Reinforcement learning: An introduction. MIT press.

[19]

Thompson, W. R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4): 285-294.

[20]

Varian, H. R. 1974. Equity, envy, and efficiency. Journal of Economic Theory, 9(1): 63-91.

[21]

Young, H. P. 2004. Strategic learning and its limits. OUP Oxford.

Cited By

Pokhriyal SJain SGhalme GDhamal SGujar SDastani MSichman JAlechina NDignum V(2024)Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic BanditsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663018(1576-1584)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663018

Recommendations

Ballooning Multi-Armed Bandits
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

We introduce ballooning multi-armed bandits (BL-MAB), a novel extension to the classical stochastic MAB model. In the BL-MAB model, the set of available arms grows (or balloons) over time. The regret in a BL-MAB setting is computed with respect to the ...
Dueling bandits with weak regret
ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: ...
Fair algorithms for multi-agent multi-armed bandits
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are N agents and K arms, and pulling an arm generates a (possibly different) stochastic reward for each agent. Unlike the classical multi-armed bandit problem, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

February 2023

16496 pages

ISBN:978-1-57735-880-0

Copyright © 2023 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 07 February 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pokhriyal SJain SGhalme GDhamal SGujar SDastani MSichman JAlechina NDignum V(2024)Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic BanditsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663018(1576-1584)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663018

View Options

View options

Figures

Tables

Media

View Table of Conten