Adaptive pessimism via target Q-value for offline reinforcement learning
References
Index Terms
- Adaptive pessimism via target Q-value for offline reinforcement learning
Recommendations
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach
LARS '10: Proceedings of the 2010 Latin American Robotics Symposium and Intelligent Robotics MeetingReinforcement Learning (RL) is a well-known technique for the solution of problems where agents need to act with success in an unknown environment, learning through trial and error. However, this technique is not efficient enough to be used in ...
Deadly triad matters for offline reinforcement learning
AbstractIt is well known that the deadly triad of function approximation, bootstrapping, and off-policy learning can make reinforcement learning (RL) unstable or even cause it to diverge. Compared to online RL, the deadly triad is more likely to cause ...
Highlights- This study highlights the importance of the deadly triad in offline RL.
- Two effective methods for overcoming the deadly triad in offline RL are investigated.
- The performance of TD3BC can be significantly improved by addressing the ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Elsevier Science Ltd.
United Kingdom
Publication History
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0