[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

Published: 01 September 2018 Publication History

Abstract

In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, i.e., estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability.

Cited By

View all
  • (2023)Off-policy average reward actor-critic with deterministic policy searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619660(30130-30203)Online publication date: 23-Jul-2023
  • (2022)Heuristically mining the top-k high-utility itemsets with cross-entropy optimizationApplied Intelligence10.1007/s10489-021-02576-z52:15(17026-17041)Online publication date: 1-Dec-2022
  1. An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Machine Language
    Machine Language  Volume 107, Issue 8-10
    September 2018
    428 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 September 2018

    Author Tags

    1. Cross entropy method
    2. Linear function approximation
    3. Markov decision process
    4. ODE method
    5. Prediction problem
    6. Reinforcement learning
    7. Stochastic approximation algorithm

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Off-policy average reward actor-critic with deterministic policy searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619660(30130-30203)Online publication date: 23-Jul-2023
    • (2022)Heuristically mining the top-k high-utility itemsets with cross-entropy optimizationApplied Intelligence10.1007/s10489-021-02576-z52:15(17026-17041)Online publication date: 1-Dec-2022

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media