Contextual Policy Transfer in Meta-Reinforcement Learning via Active Learning

Jingchi Jiang¹¹,
Lian Yan¹²,
Xuehui Yu¹² &
…
Yi Guan¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13579))

Included in the following conference series:

International Conference on Web Information Systems and Applications

1163 Accesses
1 Citations

Abstract

In meta-reinforcement learning (meta-RL), agents that consider the context when transferring source policies have been shown to outperform context-free approaches. However, existing approaches require large amounts of on-policy experience to adapt to novel tasks, limiting their practicality and sample efficiency. In this paper, we jointly perform off-policy meta-RL and active learning to generate the latent context of the novel task by reusing valuable experiences from source tasks. To calculate the importance weight of source experience for adaptation, we employ maximum mean discrepancy (MMD) as the criterion to minimize the experience distribution distance between the target task and the adapted source tasks in a reproducing kernel Hilbert space (RKHS). Integrating source experiences based on active queries with a small amount of on-policy target experience, we demonstrate that the experience sampling benefits the fine-tuning of the contextual policy. Then, we incorporate it into a standard meta-RL framework and verify its effectiveness on four continuous control environments, simulated via the MuJoCo simulator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference

Balanced prioritized experience replay in off-policy reinforcement learning

Article 18 May 2024

Guiding Task Learning by Hierarchical RL with an Experience Replay Mechanism Through Reward Machines

Notes

1.
https://github.com/deepmind/mujoco.

References

Mousavi, S.S., Schukat, M., Howley, E.: Deep reinforcement learning: an overview. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 426–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_32
Chapter Google Scholar
Li, P., Yin, Z., Li, F.: Quality control method for peer assessment system based on multi-dimensional information. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds.) WISA 2020. LNCS, vol. 12432, pp. 184–193. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60029-7_17
Chapter Google Scholar
Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2020)
Article Google Scholar
Zang, X., Yao, H., Zheng, G., Nan, X., Kai, X., Li, Z.: Metalight: value-based meta-reinforcement learning for traffic signal control. Proc. AAAI Conf. Artif. Intell. 34, 1153–1160 (2020)
Google Scholar
Lin, L., Zhenguo, L., Xiaohong, G., Pinghui, W.: Meta reinforcement learning with task embedding and shared policy. arXiv preprint arXiv:1905.06527 (2019)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(7), (2009)
Google Scholar
Chen, X., Duan, Y., Chen, Z., Xu, H., Chen, Z., Liang, X., Zhang, T., Li, Z.: CATCH: context-based meta reinforcement learning for transferrable architecture search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 185–202. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_12
Chapter Google Scholar
Yunhao, T., Tadashi, K., Mark, R., Rémi, M., Michalm, V.: Unifying gradient estimators for meta-reinforcement learning via off-policy evaluation. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Kate, R., Aurick, Z., Chelsea, F., Sergey, L., Deirdre, Q.: Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: International conference on machine learning, pp. 5331–5340. PMLR, (2019)
Google Scholar
Huang, S.-T., Chen, S.: Transfer learning with active queries from source domain. In IJCAI, pp. 1592–1598 (2016)
Google Scholar
Rothfuss, J., Lee, Clavera, I., Asfour, T., Abbeel, P.: Promp: Proximal meta-policy search. arXiv preprint arXiv:1810.06784 (2018)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Google Scholar
Gurumurthy, S., Kumar, S., Sycara, K.: Mame: Model-agnostic meta-exploration. In Conference on Robot Learning, pp. 910–922. PMLR (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

AIoT Research Center, Harbin Institute of Technology, Harbin, 150001, China
Jingchi Jiang
Language Technology Research Center, Harbin Institute of Technology, Harbin, 150001, China
Lian Yan, Xuehui Yu & Yi Guan

Authors

Jingchi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Lian Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xuehui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingchi Jiang .

Editor information

Editors and Affiliations

National University of Defense Technology, Changsha, China
Xiang Zhao
Guangzhou University, Guangzhou, China
Shiyu Yang
Tianjin University, Tianjin, China
Xin Wang
Deakin University, Melbourne, VIC, Australia
Jianxin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, J., Yan, L., Yu, X., Guan, Y. (2022). Contextual Policy Transfer in Meta-Reinforcement Learning via Active Learning. In: Zhao, X., Yang, S., Wang, X., Li, J. (eds) Web Information Systems and Applications. WISA 2022. Lecture Notes in Computer Science, vol 13579. Springer, Cham. https://doi.org/10.1007/978-3-031-20309-1_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-20309-1_31
Published: 08 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20308-4
Online ISBN: 978-3-031-20309-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contextual Policy Transfer in Meta-Reinforcement Learning via Active Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference

Balanced prioritized experience replay in off-policy reinforcement learning

Guiding Task Learning by Hierarchical RL with an Experience Replay Mechanism Through Reward Machines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Contextual Policy Transfer in Meta-Reinforcement Learning via Active Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference

Balanced prioritized experience replay in off-policy reinforcement learning

Guiding Task Learning by Hierarchical RL with an Experience Replay Mechanism Through Reward Machines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation