More Web Proxy on the site http://driver.im/

research-article

Free access

Efficient Prior-Free Mechanisms for No-Regret Agents

Authors:

Natalie Collina,

Han ShaoAuthors Info & Claims

EC '24: Proceedings of the 25th ACM Conference on Economics and Computation

Pages 511 - 541

https://doi.org/10.1145/3670865.3673460

Published: 17 December 2024 Publication History

Abstract

We study a repeated Principal Agent problem between a long lived Principal and Agent pair in a prior free setting. In our setting, the sequence of realized states of nature may be adversarially chosen, the Agent is non-myopic, and the Principal aims for a strong form of policy regret. Following [Camara et al., 2020], we model the Agent's long-run behavior with behavioral assumptions that relax the common prior assumption (for example, that the Agent has no swap regret). Within this framework, we revisit the mechanism proposed by [Camara et al., 2020], which informally uses calibrated forecasts of the unknown states of nature in place of a common prior. We give two main improvements. First, we give a mechanism that has an exponentially improved dependence (in terms of both running time and regret bounds) on the number of distinct states of nature. To do this, we show that our mechanism does not require truly calibrated forecasts, but rather forecasts that are unbiased subject to only a polynomially sized collection of events --- which can be produced with polynomial overhead. Second, instead of constructing a policy and assuming that the policy is "stable" (informally requiring that under such a policy, all approximately optimal actions lead to approximately the same Principal payoff), we propose a general framework given access to a stable policy oracle. We then instantiate the oracle by developing efficient algorithms in several significant special cases, including the focal linear contracting setting. Taken together, our new mechanism makes the compelling framework proposed by [Camara et al., 2020] more powerful, now able to be realized over polynomially sized state spaces.

References

[1]

Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. 2015. Commitment Without Regrets: Online Learning in Stackelberg Security Games. Proceedings of the Sixteenth ACM Conference on Economics and Computation (2015). https://api.semanticscholar.org/CorpusID:14830193

Digital Library

[2]

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Francesco Trovò, and Nicola Gatti. 2023. Optimal rates and efficient algorithms for online Bayesian persuasion. In International Conference on Machine Learning. PMLR, 2164--2183.

[3]

Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, and Francesco Trovò. 2022. Sequential information design: Learning to persuade in the dark. Advances in Neural Information Processing Systems 35 (2022), 15917--15928.

[4]

Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. 2014. Learning optimal commitment to overcome insecurity. Advances in Neural Information Processing Systems 27 (2014).

[5]

Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. 2008. Regret minimization and the price of total anarchy. In Proceedings of the fortieth annual ACM symposium on Theory of computing. 373--382.

Digital Library

[6]

Avrim Blum and Yishay Mansour. 2007. From external to internal regret. Journal of Machine Learning Research 8, 6 (2007).

[7]

Patrick Bolton and Mathias Dewatripont. 2004. Contract theory. MIT press.

[8]

Mark Braverman, Jieming Mao, Jon Schneider, and Matt Weinberg. 2018. Selling to a no-regret buyer. In Proceedings of the 2018 ACM Conference on Economics and Computation. 523--538.

Digital Library

[9]

William Brown, Jon Schneider, and Kiran Vodrahalli. 2023. Is Learning in Games Good for the Learners? arXiv preprint arXiv:2305.19496 (2023).

[10]

Linda Cai, S Matthew Weinberg, Evan Wildenhain, and Shirley Zhang. 2023. Selling to Multiple No-Regret Buyers. arXiv preprint arXiv:2307.04175 (2023).

[11]

Modibo K Camara, Jason D Hartline, and Aleck Johnsen. 2020. Mechanisms for a no-regret agent: Beyond the common prior. In 2020 ieee 61st annual symposium on foundations of computer science (focs). IEEE, 259--270.

[12]

Gabriel Carroll. 2015. Robustness and linear contracts. American Economic Review 105, 2 (2015), 536--563.

[13]

Gabriel Carroll. 2021. Contract Theory. (2021).

[14]

Matteo Castiglioni, Alberto Marchesi, and Nicola Gatti. 2021. Bayesian Agency: Linear versus Tractable Contracts. arXiv e-prints (2021), arXiv-2106.

[15]

Sylvain Chassang. 2013. Calibrated incentive contracts. Econometrica 81, 5 (2013), 1935--1971.

[16]

Yiling Chen, Yang Liu, and Chara Podimata. 2020. Learning strategy-aware linear classifiers. Advances in Neural Information Processing Systems 33 (2020), 15265--15276.

[17]

Alon Cohen, Argyrios Deligkas, and Moran Koren. 2022. Learning approximately optimal contracts. In International Symposium on Algorithmic Game Theory. Springer, 331--346.

Digital Library

[18]

Lee Cohen and Yishay Mansour. 2019. Optimal algorithm for bayesian incentive-compatible exploration. In Proceedings of the 2019 ACM Conference on Economics and Computation. 135--151.

Digital Library

[19]

Natalie Collina, Eshwar Ram Arunachaleswaran, and Michael Kearns. 2023. Efficient Stackelberg Strategies for Finitely Repeated Games. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 643--651.

Digital Library

[20]

Yuan Deng, Jon Schneider, and Balusubramanian Sivan. 2019. Strategizing against No-regret Learners. arXiv:1909.13861 [cs.GT]

[21]

Jinshuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner, and Zhiwei Steven Wu. 2018. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation. 55--70.

Digital Library

[22]

Shaddin Dughmi and Haifeng Xu. 2016. Algorithmic bayesian persuasion. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. 412--425.

Digital Library

[23]

Paul Dütting, Tomer Ezra, Michal Feldman, and Thomas Kesselheim. 2022. Combinatorial contracts. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 815--826.

[24]

Paul Dütting, Tim Roughgarden, and Inbal Talgam-Cohen. 2019. Simple versus Optimal Contracts. In Proceedings of the 2019 ACM Conference on Economics and Computation (Phoenix, AZ, USA) (EC '19). Association for Computing Machinery, New York, NY, USA, 369--387.

Digital Library

[25]

Dean P Foster and Sergiu Hart. 2018. Smooth calibration, leaky forecasts, finite recall, and nash dynamics. Games and Economic Behavior 109 (2018), 271--293.

[26]

Dean P Foster and Rakesh Vohra. 1999. Regret in the on-line decision problem. Games and Economic Behavior 29, 1--2 (1999), 7--35.

[27]

Dean P Foster and Rakesh V Vohra. 1998. Asymptotic calibration. Biometrika 85, 2 (1998), 379--390.

[28]

Jiarui Gan, Rupak Majumdar, Goran Radanovic, and Adish Singla. 2022. Bayesian persuasion in sequential decision-making. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 5025--5033.

[29]

Sumegha Garg, Christopher Jung, Omer Reingold, and Aaron Roth. 2024. Oracle Efficient Online Multicalibration and Omniprediction. In ACM-SIAM Symposium on Discrete Algorithms.

[30]

Ira Globus-Harris, Declan Harrison, Michael Kearns, Aaron Roth, and Jessica Sorrell. 2023. Multicalibration as Boosting for Regression. In International Conference on Machine Learning, ICML 2023, 23--29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 11459--11492. https://proceedings.mlr.press/v202/globus-harris23a.html

[31]

Eyal Gofer and Yishay Mansour. 2016. Lower Bounds on Individual Sequence Regret. Mach. Learn. 103, 1 (apr 2016), 1--26.

Digital Library

[32]

Parikshit Gopalan, Lunjia Hu, Michael P Kim, Omer Reingold, and Udi Wieder. 2023a. Loss Minimization Through the Lens Of Outcome Indistinguishability. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.

[33]

Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. 2022. Omnipredictors. In 13th Innovations in Theoretical Computer Science Conference, ITCS 2022, January 31 - February 3, 2022, Berkeley, CA, USA (LIPIcs, Vol. 215), Mark Braverman (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 79:1--79:21.

[34]

Parikshit Gopalan, Michael P Kim, and Omer Reingold. 2023b. Characterizing notions of omniprediction via multicalibration. arXiv preprint arXiv:2302.06726 (2023).

[35]

Sanford J Grossman and Oliver D Hart. 1992. An analysis of the principal-agent problem. In Foundations of Insurance Economics: Readings in Economics and Finance. Springer, 302--340.

[36]

Nika Haghtalab, Thodoris Lykouris, Sloan Nietert, and Alexander Wei. 2022. Learning in stackelberg games with non-myopic agents. In Proceedings of the 23rd ACM Conference on Economics and Computation. 917--918.

Digital Library

[37]

Nika Haghtalab, Chara Podimata, and Kunhe Yang. 2023. Calibrated Stackelberg Games: Learning Optimal Commitments Against Calibrated Agents. arXiv:2306.02704 [cs.GT]

[38]

Ursula Hébert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. 2018. Multicalibration: Calibration for the (computationally-identifiable) masses. In International Conference on Machine Learning. PMLR, 1939--1948.

[39]

Chien-Ju Ho, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. 2014. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. In Proceedings of the fifteenth ACM conference on Economics and computation. 359--376.

Digital Library

[40]

Bengt Holmström. 1979. Moral hazard and observability. The Bell journal of economics (1979), 74--91.

[41]

Bengt Holmstrom and Paul Milgrom. 1987. Aggregation and linearity in the provision of intertemporal incentives. Econometrica: Journal of the Econometric Society (1987), 303--328.

[42]

Sham M Kakade and Dean P Foster. 2008. Deterministic calibration and Nash equilibrium. J. Comput. System Sci. 74, 1 (2008), 115--130.

Digital Library

[43]

Emir Kamenica and Matthew Gentzkow. 2011. Bayesian persuasion. American Economic Review 101, 6 (2011), 2590--2615.

[44]

Yoav Kolumbus and Noam Nisan. 2022. How and why to manipulate your own agent: On the incentives of users of learning agents. Advances in Neural Information Processing Systems 35 (2022), 28080--28094.

[45]

Thodoris Lykouris, Vasilis Syrgkanis, and Éva Tardos. 2016. Learning and efficiency in games with dynamic population. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms. SIAM, 120--129.

Digital Library

[46]

Yishay Mansour, Mehryar Mohri, Jon Schneider, and Balasubramanian Sivan. 2022a. Strategizing against learners in bayesian games. In Conference on Learning Theory. PMLR, 5221--5252.

[47]

Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, and Zhiwei Steven Wu. 2022b. Bayesian exploration: Incentivizing exploration in Bayesian games. Operations Research 70, 2 (2022), 1105--1127.

Digital Library

[48]

Denis Nekipelov, Vasilis Syrgkanis, and Eva Tardos. 2015. Econometrics for learning agents. In Proceedings of the sixteenth acm conference on economics and computation. 1--18.

Digital Library

[49]

Georgy Noarov, Ramya Ramalingam, Aaron Roth, and Stephan Xie. 2023. High-Dimensional Prediction for Sequential Decision Making. arXiv preprint arXiv:2310.17651 (2023).

[50]

Aaron Roth. 2023. Uncertain: Modern Topics in Uncertainty Estimation.

[51]

Aaron Roth, Aleksandrs Slivkins, Jonathan Ullman, and Zhiwei Steven Wu. 2020. Multidimensional dynamic pricing for welfare maximization. ACM Transactions on Economics and Computation (TEAC) 8, 1 (2020), 1--35.

Digital Library

[52]

Aaron Roth, Jonathan Ullman, and Zhiwei Steven Wu. 2016. Watch and learn: Optimizing from revealed preferences feedback. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. 949--962.

Digital Library

[53]

Tim Roughgarden. 2015. Intrinsic robustness of the price of anarchy. Journal of the ACM (JACM) 62, 5 (2015), 1--42.

Digital Library

[54]

Mark Sellke and Aleksandrs Slivkins. 2021. The price of incentivizing exploration: A characterization via thompson sampling and sample complexity. In Proceedings of the 22nd ACM Conference on Economics and Computation. 795--796.

Digital Library

[55]

Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I Jordan, and Haifeng Xu. 2022. Sequential information design: Markov persuasion process and its efficient reinforcement learning. arXiv preprint arXiv:2202.10678 (2022).

[56]

Shengjia Zhao, Michael Kim, Roshni Sahoo, Tengyu Ma, and Stefano Ermon. 2021. Calibrating predictions to decisions: A novel approach to multi-class calibration. Advances in Neural Information Processing Systems 34 (2021), 22313--22324.

[57]

Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, and Michael I Jordan. 2022. The sample complexity of online contract design. arXiv preprint arXiv:2211.05732 (2022).

[58]

You Zu, Krishnamurthy Iyer, and Haifeng Xu. 2021. Learning to persuade on the fly: Robustness against ignorance. In Proceedings of the 22nd ACM Conference on Economics and Computation. 927--928.

Digital Library

Index Terms

Efficient Prior-Free Mechanisms for No-Regret Agents
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Algorithmic game theory and mechanism design

Recommendations

Prior-free and prior-dependent regret bounds for Thompson Sampling
NIPS'13: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1

We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We are interested in studying prior-free and prior-dependent regret bounds, very much in the same spirit than the usual distribution-free and ...
Dueling bandits with weak regret
ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: ...
Regret minimization in online Bayesian persuasion: Handling adversarial receiver's types under full and partial feedback models
Abstract
In Bayesian persuasion, an informed sender has to design a signaling scheme that discloses the right amount of information so as to influence the behavior of a self-interested receiver. This kind of strategic interaction is ubiquitous ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

EC '24: Proceedings of the 25th ACM Conference on Economics and Computation

July 2024

1340 pages

ISBN:9798400707049

DOI:10.1145/3670865

Chair:
Dirk Bergemann,
Program Chairs:
Robert Kleinberg,
Daniela Saban

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGecom: Special Interest Group on Economics and Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

EC '24

Sponsor:

SIGecom

EC '24: 25th ACM Conference on Economics and Computation

July 8 - 11, 2024

CT, New Haven, USA

Acceptance Rates

Overall Acceptance Rate 664 of 2,389 submissions, 28%

Upcoming Conference

EC '25

Sponsor:
sigecom

The 25th ACM Conference on Economics and Computation

July 7 - 11, 2025

Stanford , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
15
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)15

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents