Limited preference aided imitation learning from imperfect demonstrations
Abstract
References
Index Terms
- Limited preference aided imitation learning from imperfect demonstrations
Recommendations
Anomaly Guided Policy Learning from Imperfect Demonstrations
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent SystemsLearning from Demonstrations (LfD) refers to using expert demonstrations combined with the reward information given by the environment to jointly guide the learning of policy in Reinforcement Learning. Previous LfD methods usually assume that provided ...
Unlabeled imperfect demonstrations in adversarial imitation learning
AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial IntelligenceAdversarial imitation learning has become a widely used imitation learning framework. The discriminator is often trained by taking expert demonstrations and policy trajectories as examples respectively from two categories (positive vs. negative) and the ...
Imitation Learning to Outperform Demonstrators by Directly Extrapolating Demonstrations
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementWe consider the problem of imitation learning from suboptimal demonstrations that aims to learn a better policy than demonstrators. Previous methods usually learn a reward function to encode the underlying intention of the demonstrators and use standard ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
JMLR.org
Publication History
Qualifiers
- Research-article
- Research
- Refereed limited
Acceptance Rates
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0