Abstract
Stochastic multi-armed bandit (MAB) mechanisms are widely used in sponsored search auctions, crowdsourcing, online procurement, etc. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of Ω(T2/3), where T is the number of time steps. This happens because the existing mechanisms consider the worst case scenario where the means of the agents’ stochastic rewards are separated by a very small amount that depends on T. We make, and, exploit the crucial observation that in most scenarios, the separation between the agents’ rewards is rarely a function of T. Moreover, in the case that the rewards of the arms are arbitrarily close, the regret contributed by such sub-optimal arms is minimal. Our idea is to allow the center to indicate the resolution, Δ, with which the agents must be distinguished. This immediately leads us to introduce the notion of Δ-Regret. Using sponsored search auctions as a concrete example (the same idea applies for other applications as well), we propose a dominant strategy incentive compatible (DSIC) and individually rational (IR), deterministic MAB mechanism, based on ideas from the Upper Confidence Bound (UCB) family of MAB algorithms. Remarkably, the proposed mechanism Δ-UCB achieves a Δ-regret of \(O(\log T)\) for the case of sponsored search auctions. We first establish the results for single slot sponsored search auctions and then non-trivially extend the results to the case where multiple slots are to be allocated.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal S, Goyal N (2012) Analysis of thompson sampling for the multi-armed bandit problem. In: COLT, pp 39.1–39.26
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach learn 47(2-3):235–256
Babaioff M, Kleinberg RD, Slivkins A (2010) Truthful mechanisms with implicit payment computation. In: Proceedings of the Eleventh ACM conference on electronic commerce (EC’10), ACM, pp 43–52
Babaioff M, Sharma Y, Slivkins A (2014) Characterizing truthful multi-armed bandit mechanisms. SIAM J Comput 43(1):194–230
Bhat S, Padmanabhan D, Jain S, Narahari Y (2016) A truthful mechanism with biparameter learning for online crowdsourcing: (extended abstract). In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems (AAMAS’16), Singapore, May 9-13, 2016, pp 1385–1386
Biswas A, Jain S, Mandal D, Narahari Y (2015) A truthful budget feasible multi-armed bandit mechanism for crowdsourcing time critical tasks. In: Proceedings of the 2015 international conference on autonomous agents and multiagent systems (AAMAS’15), pp 1101–1109
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found Trends Mach Learn 5(1):1–122
Bubeck S, Cesa-bianchi N, Lugosi G (2013) Bandits with heavy tail. IEEE Trans Inf Theory 59(11):7711–7717
Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework and applications. In: International conference on machine learning (ICML), pp 151–159
Devanur NR, Kakade SM (2009) The price of truthfulness for pay-per-click auctions. In: Proceedings of the 10th ACM conference on electronic commerce (EC’09), pp 99–106
Dirkx R, Dimitrakopoulos R (2018) Optimizing infill drilling decisions using multi-armed bandits: Application in a long-term, multi-element stockpile. Math Geosci 50(1):35–52
Feldman Z, Domshlak C (2014) Simple regret optimization in online planning for markov decision processes. J Artif Intell Res (JAIR) 51(1):165–205
Gatti N, Lazaric A, Rocco M, Trovò F (2015) Truthful learning mechanisms for multi-slot sponsored search auctions with externalities. Artif Intell 227:93–139
Gatti N, Lazaric A, Trovò F (2012) A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities. In: Proceedings of the 13th ACM conference on electronic commerce (EC’12), pp 605–622
Ghalme Ganesh, Jain Shweta, Gujar Sujit, Narahari Y. (2017) Thompson sampling based mechanisms for stochastic multi-armed bandit problems. In: Proceedings of the 16th conference on autonomous agents and multiagent systems (AAMAS), pp 87– 95
Gonen Rica, Pavlov Elan (2007) An incentive-compatible multi-armed bandit mechanism. In: Proceedings of the Twenty-sixth annual ACM symposium on principles of distributed computing (PODC), pp 362–363
Gonen R, Pavlov E (2009) Adaptive incentive-compatible sponsored search auction. In: SOFSEM 2009: theory and practice of computer science, pp 303–316
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Jain S, Bhat S, Ghalme G, Padmanabhan D, Narahari Y (2016) Mechanisms with learning for stochastic multi-armed bandit problems. Indian J Pure Appl Math 47(2):229–272
Jain S, Ghalme G, Bhat S, Gujar S, Narahari Y (2016) A deterministic MAB mechanism for crowdsourcing with logarithmic regret and immediate payments. In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems (AAMAS’16), Singapore, May 9-13, 2016, pp 86–94
Jain S, Gujar S, Bhat S, Zoeter O, Narahari Y (2018) A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing. Artif Intell 254(Supplement C):44–63
Kapoor S, Patel KK, Kar P (2018) Corruption-tolerant bandit learning. Machine Learning, pp 1–29
Kleinberg R, Niculescu-Mizil A, Sharma Y (2010) Regret bounds for sleeping experts and bandits. Mach Learn 80(2):245– 272
Liu Chang, Cai Qingpeng, Zhang Yukui (2017) Multi-armed bandit mechanism with private histories. In: Proceedings of the 16th conference on autonomous agents and MultiAgent systems (AAMAS), pp 1607–1609
Myerson RB (1991) Game Theory: Analysis of Conflict, Harvard University Press, Cambridge
Narahari Y. (2014) Game Theory and Mechanism Design. IISc Press and the World Scientific Publishing Company
Nisan Noam, Ronen Amir (2007) Computationally feasible vcg mechanisms. J Artif Intell Rese (JAIR) 29(1):19–47
Nisan N, Roughgarden T, Tardos E, Vazirani VV (2007) Algorithmic Game Theory. Cambridge University Press, New York
Santiago Ontanon (2017) Combinatorial multi-armed bandits for real-time strategy games. J Artif Intell Res (JAIR) 58:665–702
Padmanabhan D, Bhat S, Garg D, Shevade SK, Narahari Y (2016) A robust UCB scheme for active learning in regression from strategic crowds. In: International joint conference on neural networks, IJCNN 2016, pp 2212–2219
Scott SL (2010) A modern bayesian look at the multi-armed bandit. Appl Stoch Model Bus Ind 26(6):639–658
Das Sharma A, Gujar S, Narahari Y (2012) Truthful multi-armed bandit mechanisms for multi-slot sponsored search auctions. Curr Sci 103(9):1064–1077
Vickrey W (1961) Counterspeculation, Auctions, and competitive sealed tenders. The Journal of Finance 16(1):8–37
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Padmanabhan, D., Bhat, S., Prabuchandran, K.J. et al. Dominant strategy truthful, deterministic multi-armed bandit mechanisms with logarithmic regret for sponsored search auctions. Appl Intell 52, 3209–3226 (2022). https://doi.org/10.1007/s10489-021-02387-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02387-2