[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

The Continuum-Armed Bandit Problem

Published: 01 November 1995 Publication History

Abstract

In this paper we consider the multiarmed bandit problem where the arms are chosen from a subset of the real line and the mean rewards are assumed to be a continuous function of the arms. The problem with an infinite number of arms is much more difficult than the usual one with a finite number of arms because the built-in learning task is now infinite dimensional. We devise a kernel estimator-based learning scheme for the mean reward as a function of the arms. Using this learning scheme, we construct a class of certainty equivalence control with forcing schemes and derive asymptotic upper bounds on their learning loss. To the best of our knowledge, these bounds are the strongest rates yet available. Moreover, they are stronger than the $o(n)$ required for optimality with respect to the average-cost-per-unit-time criterion.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Control and Optimization
SIAM Journal on Control and Optimization  Volume 33, Issue 6
Nov. 1995
341 pages
ISSN:0363-0129
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 November 1995

Author Tags

  1. bandit problems
  2. certainty equivalence with forcing
  3. continuous arms
  4. controlled i.i. d. process
  5. learning loss
  6. stochastic adaptive control

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Dynamic Pricing with Unknown Nonparametric Demand and Limited Price ChangesOperations Research10.1287/opre.2020.044572:6(2726-2744)Online publication date: 1-Nov-2024
  • (2024)Lipschitz Bandits With Batched FeedbackIEEE Transactions on Information Theory10.1109/TIT.2023.331230870:3(2154-2176)Online publication date: 1-Mar-2024
  • (2024)Grinding mill optimisation using grind curves and continuum-armed banditsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108931136:PBOnline publication date: 1-Oct-2024
  • (2024)Parameterized aspects of distinct Kemeny rank aggregationActa Informatica10.1007/s00236-024-00463-x61:4(401-414)Online publication date: 1-Dec-2024
  • (2024)Parameterized Aspects of Distinct Kemeny Rank AggregationAlgorithms and Discrete Applied Mathematics10.1007/978-3-031-52213-0_2(14-28)Online publication date: 15-Feb-2024
  • (2023)On the sublinear regret of GP-UCBProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667654(35266-35276)Online publication date: 10-Dec-2023
  • (2023)Assistance in Teleoperation of Redundant Robots through Predictive Joint ManeuveringACM Transactions on Human-Robot Interaction10.1145/363026513:3(1-23)Online publication date: 3-Nov-2023
  • (2023)Invariant Lipschitz Bandits: A Side Observation ApproachMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43421-1_31(524-539)Online publication date: 18-Sep-2023
  • (2022)Lipschitz bandits with batched feedbackProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601712(19836-19848)Online publication date: 28-Nov-2022
  • (2022)Learning Approximately Optimal ContractsAlgorithmic Game Theory10.1007/978-3-031-15714-1_19(331-346)Online publication date: 12-Sep-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media