[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Linearly Parameterized Bandits

Published: 01 May 2010 Publication History

Abstract

We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vector Z ∈ Rr, where r ≥ 2. The objective is to minimize the cumulative regret and Bayes risk. When the set of arms corresponds to the unit sphere, we prove that the regret and Bayes risk is of order Θ(r √T), by establishing a lower bound for an arbitrary policy, and showing that a matching upper bound is obtained through a policy that alternates between exploration and exploitation phases. The phase-based policy is also shown to be effective if the set of arms satisfies a strong convexity condition. For the case of a general set of arms, we describe a near-optimal policy whose regret and Bayes risk admit upper bounds of the form O(r √T log3/2T).

References

[1]
Abe, N. and Long, P. M., "Associative reinforcement learning using linear probabilistic concepts," Proc. 16th Internat. Conf. Machine Learn., Morgan Kaufman, San Francisco, pp. 3-11, 1999.
[2]
Agrawal, R., "Sample mean based index policies with O(log n) regret for the multi-armed bandit problem," Adv. Appl. Probab., v27, pp. 1054-1078, 1995.
[3]
Agrawal, R., Teneketzis, D. and Anantharam, V., "Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space," IEEE Trans. Automatic Control, v34, pp. 258-267, 1989.
[4]
Auer, P., "Using confidence bounds for exploitation-exploration trade-offs," J. Machine Learn. Res., v3, pp. 397-422, 2002.
[5]
Auer, P., Cesa-Bianchi, N. and Fischer, P., "Finite-time analysis of the multi-armed bandit problem," Machine Learn., v47, pp. 235-256, 2002.
[6]
Berry, D. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, Chapman and Hall, London, 1985.
[7]
Bertsekas, D., Dynamic Programming and Optimal Controls, v1, Athena Scientific, Belmont, MA, 1995.
[8]
Bertsekas, D. and Tsitsiklis, J. N., Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.
[9]
Bertsimas, D. and Tsitsiklis, J. N., Introduction to Linear Optimization, Athena Scientific, Belmont, MA, 1997.
[10]
Blum, J. R., "Multidimensional stochastic approximation methods," Ann. Math. Statist., v25, pp. 737-744, 1954.
[11]
Cicek, D., Broadie, M. and Zeevi, A., "General bounds and finite-time performance improvement for the Kiefer-Wolfowitz stochastic approximation algorithm," 2009.
[12]
Dani, V., Hayes, T. P. and Kakade, S. M., "Stochastic linear optimization under bandit feedback," Proc. 21st Annual Conf. Learn. Theory (COLT 2008), Helsinki, Finland, pp. 355-366, 2008a.
[13]
Dani, V., Hayes, T. P. and Kakade, S. M., "Stochastic linear optimization under bandit feedback," 2008b.
[14]
Feldman, D., Contributions to the “two-armed bandit” problem, Ann. Math. Statist., v33, pp. 847-856, 1962.
[15]
Fiedler, M. and Pták, V., "A new positive definite geometric mean of two positive definite matrices," Linear Algebra Its Appl., v251, pp. 1-20, 1997.
[16]
Ginebra, J. and Clayton, M. K., "Response surface bandits," J. Roy. Statist. Soc. Ser. B (Methodological), v57, pp. 771-784, 1995.
[17]
Goldenshluger, A. and Zeevi, A., "Performance limitations in bandit problems with side observations," 2008.
[18]
Goldenshluger, A. and Zeevi, A., "Woodroofe's one-armed bandit problem revisited," Ann. Appl. Probab., v19, pp. 1603-1633, 2009.
[19]
Keener, R., Further contributions to the “two-armed bandit” problem, Ann. Statist., v13, pp. 418-422, 1985.
[20]
Kiefer, J. and Wolfowitz, J., "Stochastic estimation of the maximum of a regression function," Ann. Math. Statist., v23, pp. 462-466, 1952.
[21]
Lai, T., "Stochastic approximation (invited paper)," Ann. Statist., v31, pp. 391-406, 2003.
[22]
Lai, T. L., "Adaptive treatment allocation and the multi-armed bandit problem," Ann. Statist., v15, pp. 1091-1114, 1987.
[23]
Lai, T. L. and Robbins, H., "Asymptotically efficient adaptive allocation rules," Adv. Appl. Math., v6, pp. 4-22, 1985.
[24]
Mersereau, A. J., Rusmevichientong, P. and Tsitsiklis, J. N., "A structured multi-armed bandit problem and the greedy policy," IEEE Trans. Automatic Control, v54, pp. 2787-2802, 2009.
[25]
Pandey, S., Chakrabarti, D. and Agrawal, D., "Multi-armed bandit problems with dependent arms," Proc. 24th Internat. Conf. Machine Learn., pp. 721-728, 2007.
[26]
Polovinkin, E. S., "Strongly convex analysis," Sbornik: Math., v187, pp. 259-286, 1996.
[27]
Pressman, E. L. and Sonin, I. N., Sequential Control with Incomplete Information, Academic Press, London, 1990.
[28]
Robbins, H., "Some aspects of the sequential design of experiments," Bull. Amer. Math. Soc., v58, pp. 527-535, 1952.
[29]
Robbins, H. and Monro, S., "A stochastic approximation method," Ann. Math. Statist., v22, pp. 400-407, 1951.
[30]
Rusmevichientong, P. and Tsitsiklis, J. N., "Linearly parameterized bandits (extended version)," 2010.
[31]
Thompson, W. R., "On the likelihood that one unknown probability exceeds another in view of the evidence of two samples," Biometrika, v25, pp. 285-294, 1933.
[32]
Wang, C.-C., Kulkarni, S. R. and Poor, H. V., "Bandit problems with side observations," IEEE Trans. Automatic Control, v50, pp. 338-355, 2005a.
[33]
Wang, C.-C., Kulkarni, S. R. and Poor, H. V., "Arbitrary side observations in bandit problems," Adv. Appl. Math., v34, pp. 903-938, 2005b.

Cited By

View all
  1. Linearly Parameterized Bandits

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Mathematics of Operations Research
    Mathematics of Operations Research  Volume 35, Issue 2
    May 2010
    241 pages

    Publisher

    INFORMS

    Linthicum, MD, United States

    Publication History

    Published: 01 May 2010
    Received: 19 January 2009

    Author Tags

    1. adaptive control
    2. multi-armed bandit
    3. parametric model

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)To Interfere or Not To InterfereOperations Research10.1287/opre.2023.036372:6(2391-2412)Online publication date: 1-Nov-2024
    • (2024)Online Learning and Pricing for Service Systems with Reusable ResourcesOperations Research10.1287/opre.2022.238172:3(1203-1241)Online publication date: 1-May-2024
    • (2024)Pricing and Positioning of Horizontally Differentiated Products with Incomplete Demand InformationOperations Research10.1287/opre.2021.009372:6(2446-2466)Online publication date: 1-Nov-2024
    • (2024)Tiered AssortmentManagement Science10.1287/mnsc.2023.494070:8(5481-5501)Online publication date: 1-Aug-2024
    • (2024)Optimal Learning for Structured BanditsManagement Science10.1287/mnsc.2020.0210870:6(3951-3998)Online publication date: 1-Jun-2024
    • (2024)Influence Maximization via Graph Neural BanditsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671983(771-781)Online publication date: 25-Aug-2024
    • (2024)Analyzing Queueing Problems via Bandits With Linear Reward & Nonlinear Workload FairnessIEEE Transactions on Mobile Computing10.1109/TMC.2023.327615823:4(3410-3423)Online publication date: 1-Apr-2024
    • (2024)Nearly Minimax-Optimal Regret for Linearly Parameterized BanditsIEEE Transactions on Information Theory10.1109/TIT.2023.326773270:1(372-388)Online publication date: 1-Jan-2024
    • (2024)Multi-armed bandits with dependent armsMachine Language10.1007/s10994-023-06457-z113:1(45-71)Online publication date: 1-Jan-2024
    • (2023)Multi-agent learning with heterogeneous linear contextual banditsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669567(78768-78790)Online publication date: 10-Dec-2023
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media