Tang et al., 2018 - Google Patents

Boosting trust region policy optimization by normalizing flows policy

Tang et al., 2018

Document ID: 11944935319810581935
Author: Tang Y; Agrawal S
Publication year: 2018
Publication venue: arXiv preprint arXiv:1809.10326

External Links

Cited by

Snippet

We propose to improve trust region policy search with normalizing flows policy. We illustrate that when the trust region is constructed by KL divergence constraints, normalizing flows policy generates samples far from the'center'of the previous policy iterate, which potentially …

Continue reading at arxiv.org (PDF) (other versions)

238000005457 optimization 0 title description 33

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition

Similar Documents

Publication	Publication Date	Title
Tang et al.	2018	Boosting trust region policy optimization by normalizing flows policy
Liu et al.	2022	Flow straight and fast: Learning to generate and transfer data with rectified flow
Zhang et al.	2018	Policy optimization as wasserstein gradient flows
Abbasnejad et al.	2020	Counterfactual vision and language learning
Xu et al.	2022	Discriminator-weighted offline imitation learning from suboptimal demonstrations
Keogh et al.	1999	Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches.
Teye et al.	2018	Bayesian uncertainty estimation for batch normalized deep networks
Jie et al.	2010	On a connection between importance sampling and the likelihood ratio policy gradient
Vieillard et al.	2020	Momentum in reinforcement learning
Abdolmaleki et al.	2017	Deriving and improving cma-es with information geometric trust regions
Vaswani et al.	2021	A general class of surrogate functions for stable and efficient reinforcement learning
Al-Matouq et al.	2015	Multiple window moving horizon estimation
Pourchot et al.	2018	Importance mixing: Improving sample reuse in evolutionary policy search methods
Amid et al.	2020	An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint
Nonnenmacher et al.	2021	Which Minimizer Does My Neural Network Converge To?
Venkatraman et al.	2024	Amortizing intractable inference in diffusion models for vision, language, and control
Nguyen et al.	2019	InfoCNF: An efficient conditional continuous normalizing flow with adaptive solvers
Fan et al.	2023	Free-form variational inference for Gaussian process state-space models
Sugiyama et al.	2008	Active learning with model selection in linear regression
Finck et al.	2011	Noisy optimization: a theoretical strategy comparison of es, egs, spsa & if on the noisy sphere
Jesson et al.	2023	Relu to the rescue: Improve your on-policy actor-critic with positive advantages
Stinis et al.	2024	SDYN-GANs: Adversarial learning methods for multistep generative models for general order stochastic dynamics
Chevallier et al.	2020	Theoretical analysis and simulation methods for Hawkes processes and their diffusion approximation
Shirakawa et al.	2015	Sample reuse in the covariance matrix adaptation evolution strategy based on importance sampling
Xu et al.	2023	Beyond Information Gain: An Empirical Benchmark for Low-Switching-Cost Reinforcement Learning