article

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

Authors:

Ajin George Joseph,

Shalabh BhatnagarAuthors Info & Claims

Machine Learning, Volume 107, Issue 8-10

Pages 1385 - 1429

https://doi.org/10.1007/s10994-018-5727-z

Published: 01 September 2018 Publication History

Abstract

In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, i.e., estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability.

Cited By

View all

Saxena NKhastagir SKolathaya SBhatnagar SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Off-policy average reward actor-critic with deterministic policy searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619660(30130-30203)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619660
Song WZheng CHuang CLiu L(2022)Heuristically mining the top-k high-utility itemsets with cross-entropy optimizationApplied Intelligence10.1007/s10489-021-02576-z52:15(17026-17041)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s10489-021-02576-z

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method
1. Mathematics of computing
  1. Mathematical analysis
    1. Functional analysis

Recommendations

An incremental off-policy search in a model-free Markov decision process using a single sample path

In this paper, we consider a modified version of the control problem in a model free Markov decision process (MDP) setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to find an ...
Risk-sensitive reinforcement learning: a martingale approach to reward uncertainty
ICAIF '20: Proceedings of the First ACM International Conference on AI in Finance

We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as ...
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

<P>A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs referred to, in general, as Markov decision problems or MDPs. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Machine Language Volume 107, Issue 8-10

September 2018

428 pages

ISSN:0885-6125

Issue’s Table of Contents

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Saxena NKhastagir SKolathaya SBhatnagar SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Off-policy average reward actor-critic with deterministic policy searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619660(30130-30203)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619660
Song WZheng CHuang CLiu L(2022)Heuristically mining the top-k high-utility itemsets with cross-entropy optimizationApplied Intelligence10.1007/s10489-021-02576-z52:15(17026-17041)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s10489-021-02576-z

Abstract

Cited By

Recommendations

An incremental off-policy search in a model-free Markov decision process using a single sample path

Risk-sensitive reinforcement learning: a martingale approach to reward uncertainty

Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations