Tang et al., 2018 - Google Patents
Boosting trust region policy optimization by normalizing flows policyTang et al., 2018
View PDF- Document ID
- 11944935319810581935
- Author
- Tang Y
- Agrawal S
- Publication year
- Publication venue
- arXiv preprint arXiv:1809.10326
External Links
Snippet
We propose to improve trust region policy search with normalizing flows policy. We illustrate that when the trust region is constructed by KL divergence constraints, normalizing flows policy generates samples far from the'center'of the previous policy iterate, which potentially …
- 238000005457 optimization 0 title description 33
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tang et al. | Boosting trust region policy optimization by normalizing flows policy | |
Liu et al. | Flow straight and fast: Learning to generate and transfer data with rectified flow | |
Zhang et al. | Policy optimization as wasserstein gradient flows | |
Abbasnejad et al. | Counterfactual vision and language learning | |
Xu et al. | Discriminator-weighted offline imitation learning from suboptimal demonstrations | |
Keogh et al. | Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches. | |
Teye et al. | Bayesian uncertainty estimation for batch normalized deep networks | |
Jie et al. | On a connection between importance sampling and the likelihood ratio policy gradient | |
Vieillard et al. | Momentum in reinforcement learning | |
Abdolmaleki et al. | Deriving and improving cma-es with information geometric trust regions | |
Vaswani et al. | A general class of surrogate functions for stable and efficient reinforcement learning | |
Al-Matouq et al. | Multiple window moving horizon estimation | |
Pourchot et al. | Importance mixing: Improving sample reuse in evolutionary policy search methods | |
Amid et al. | An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint | |
Nonnenmacher et al. | Which Minimizer Does My Neural Network Converge To? | |
Venkatraman et al. | Amortizing intractable inference in diffusion models for vision, language, and control | |
Nguyen et al. | InfoCNF: An efficient conditional continuous normalizing flow with adaptive solvers | |
Fan et al. | Free-form variational inference for Gaussian process state-space models | |
Sugiyama et al. | Active learning with model selection in linear regression | |
Finck et al. | Noisy optimization: a theoretical strategy comparison of es, egs, spsa & if on the noisy sphere | |
Jesson et al. | Relu to the rescue: Improve your on-policy actor-critic with positive advantages | |
Stinis et al. | SDYN-GANs: Adversarial learning methods for multistep generative models for general order stochastic dynamics | |
Chevallier et al. | Theoretical analysis and simulation methods for Hawkes processes and their diffusion approximation | |
Shirakawa et al. | Sample reuse in the covariance matrix adaptation evolution strategy based on importance sampling | |
Xu et al. | Beyond Information Gain: An Empirical Benchmark for Low-Switching-Cost Reinforcement Learning |