[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Tang et al., 2018 - Google Patents

Boosting trust region policy optimization by normalizing flows policy

Tang et al., 2018

View PDF
Document ID
11944935319810581935
Author
Tang Y
Agrawal S
Publication year
Publication venue
arXiv preprint arXiv:1809.10326

External Links

Snippet

We propose to improve trust region policy search with normalizing flows policy. We illustrate that when the trust region is constructed by KL divergence constraints, normalizing flows policy generates samples far from the'center'of the previous policy iterate, which potentially …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5009Computer-aided design using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6296Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6279Classification techniques relating to the number of classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/12Computer systems based on biological models using genetic models
    • G06N3/126Genetic algorithms, i.e. information processing using digital simulations of the genetic system
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/02Knowledge representation
    • G06N5/022Knowledge engineering, knowledge acquisition

Similar Documents

Publication Publication Date Title
Tang et al. Boosting trust region policy optimization by normalizing flows policy
Liu et al. Flow straight and fast: Learning to generate and transfer data with rectified flow
Zhang et al. Policy optimization as wasserstein gradient flows
Abbasnejad et al. Counterfactual vision and language learning
Xu et al. Discriminator-weighted offline imitation learning from suboptimal demonstrations
Keogh et al. Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches.
Teye et al. Bayesian uncertainty estimation for batch normalized deep networks
Jie et al. On a connection between importance sampling and the likelihood ratio policy gradient
Vieillard et al. Momentum in reinforcement learning
Abdolmaleki et al. Deriving and improving cma-es with information geometric trust regions
Vaswani et al. A general class of surrogate functions for stable and efficient reinforcement learning
Al-Matouq et al. Multiple window moving horizon estimation
Pourchot et al. Importance mixing: Improving sample reuse in evolutionary policy search methods
Amid et al. An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint
Nonnenmacher et al. Which Minimizer Does My Neural Network Converge To?
Venkatraman et al. Amortizing intractable inference in diffusion models for vision, language, and control
Nguyen et al. InfoCNF: An efficient conditional continuous normalizing flow with adaptive solvers
Fan et al. Free-form variational inference for Gaussian process state-space models
Sugiyama et al. Active learning with model selection in linear regression
Finck et al. Noisy optimization: a theoretical strategy comparison of es, egs, spsa & if on the noisy sphere
Jesson et al. Relu to the rescue: Improve your on-policy actor-critic with positive advantages
Stinis et al. SDYN-GANs: Adversarial learning methods for multistep generative models for general order stochastic dynamics
Chevallier et al. Theoretical analysis and simulation methods for Hawkes processes and their diffusion approximation
Shirakawa et al. Sample reuse in the covariance matrix adaptation evolution strategy based on importance sampling
Xu et al. Beyond Information Gain: An Empirical Benchmark for Low-Switching-Cost Reinforcement Learning