8000 GitHub - qxtian/papers
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

qxtian/papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 

Repository files navigation

papers

Offline RL

Model free

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

CQL: Conservative Q-Learning for offline reinforcement learning

Is Pessimism Provably Efficient for Offline RL?

THE IMPORTANCE OF PESSIMISM IN FIXED-DATASET POLICY OPTIMIZATION

Off-Policy Deep Reinforcement Learning without Exploration, BCQ, ICML 2019.

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction, BEAR, including analysis of OOD Actions in Q-Learning.

Behavior Regularized Offline Reinforcement Learning, BRAC, generalizing BEAR, BCQ, etc.

Critic Regularized Regression, CRR, Nips 2020.

Q-Value Weighted Regression: Reinforcement Learning with Limited Data An extension of [KEEP DOING WHAT WORKED ...], unaccepted.

Batch Reinforcement Learning Through Continuation Method

Model based

MOReL: Model-Based Offline Reinforcement Learning

MOPO: Model-based Offline Policy Optimization

COMBO: Conservative Offline Model-Based Policy Optimization

MODEL-BASED OFFLINE PLANNING ICLR 2021, 8755

NIPS 2020 Offline Workshop

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization NIPS 2020

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization, ICLR 2021

app

OPAL: OFFLINE PRIMITIVE DISCOVERY FOR ACCELERATING OFFLINE REINFORCEMENT LEARNING, Offline + hrl

Multi-task Batch Reinforcement Learning with Metric Learning, NIPS2020, Multi task, Generalize to unseen tasks. BATCH REINFORCEMENT LEARNING THROUGH CONTINUATION METHOD, ICLR 2020, Offlien+continution.

COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning, CoRL 2020.

KEEP DOING WHAT WORKED: BEHAVIOR MODELLING PRIORS FOR OFFLINE REINFORCEMENT LEARNING ICLR 2020.

datasets

D4RL

doc

Offline Reinforcement Learning

NIPS2020 OfflineRL Workshop

Video tutorial

code

d4rl_evaluations, tensorflow

AWAC, CQL, MOPO, pytorch

polixir, pytorch

OPE

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning,2019, survey of OPE.

AUTOREGRESSIVE DYNAMICS MODELS FOR OFFLINE POLICY EVALUATION AND OPTIMIZATION, ICLR 2021.

Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies, NIPS 2020.

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes, code

AUTOREGRESSIVE DYNAMICS MODELS FOR OFFLINE POLICY EVALUATION AND OPTIMIZATION, ICLR 2021.

Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluationcode

Batch Policy Learning under Constraints, FQE

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning, Doubly-robust(DR)

code

DualDICE,from google-research.

github-code,Implementation of importance sampling, direct, and hybrid methods for off-policy evaluation.

code-Empirical Study of Off Policy Policy Estimation,OPE Tools

BENCHMARKS FOR DEEP OFF-POLICY EVALUATION, [code], ICLR 2021, This release provides: 1)Policies for the tasks in the D4RL, DeepMind Locomotion and Control Suite datasets (described below). 2) Policies trained with the following algorithms (D4PG, ABM, CRR, SAC, DAPG and BC) and snapshots along the training trajectory. This faciliates benchmarking offline model selection.[auxiliary code], [auxiliary code dice]

IRL

GAIL- Generative Adversarial Imitation Learning

Sim2Real

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model, CoRR, 2016 propose a method to combine an inverse dynamics policy learned with expert demonstrations and a simulator policy trained in simulator.

Offline Imitation Learning with a Misspecified Simulator, Nips, 2020

Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning

IL

Error Bounds of Imitating Policies and Environments The paper analyzes the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding errors compared to behavioral cloning, and thus has a better sample complexity.

RL-self-supervised

SELF-SUPERVISED POLICY ADAPTATION DURING DEPLOYMENT

HRL

Between MDPs and semi-MDPs:A framework for temporal abstractionin reinforcement learning Sutton, 1999, option framework

ML

Pareto

Pareto Multi-Task Learning

Multi-Objective Reinforcement Learning using Sets of Pareto Dominating Policies, 2014, JMLR,Pareto in RL.

Bridging Theory and Algorithm for Domain Adaptation, domain transfer, MMD.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0