Abstract
In meta-reinforcement learning (meta-RL), agents that consider the context when transferring source policies have been shown to outperform context-free approaches. However, existing approaches require large amounts of on-policy experience to adapt to novel tasks, limiting their practicality and sample efficiency. In this paper, we jointly perform off-policy meta-RL and active learning to generate the latent context of the novel task by reusing valuable experiences from source tasks. To calculate the importance weight of source experience for adaptation, we employ maximum mean discrepancy (MMD) as the criterion to minimize the experience distribution distance between the target task and the adapted source tasks in a reproducing kernel Hilbert space (RKHS). Integrating source experiences based on active queries with a small amount of on-policy target experience, we demonstrate that the experience sampling benefits the fine-tuning of the contextual policy. Then, we incorporate it into a standard meta-RL framework and verify its effectiveness on four continuous control environments, simulated via the MuJoCo simulator.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mousavi, S.S., Schukat, M., Howley, E.: Deep reinforcement learning: an overview. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 426–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_32
Li, P., Yin, Z., Li, F.: Quality control method for peer assessment system based on multi-dimensional information. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds.) WISA 2020. LNCS, vol. 12432, pp. 184–193. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60029-7_17
Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2020)
Zang, X., Yao, H., Zheng, G., Nan, X., Kai, X., Li, Z.: Metalight: value-based meta-reinforcement learning for traffic signal control. Proc. AAAI Conf. Artif. Intell. 34, 1153–1160 (2020)
Lin, L., Zhenguo, L., Xiaohong, G., Pinghui, W.: Meta reinforcement learning with task embedding and shared policy. arXiv preprint arXiv:1905.06527 (2019)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(7), (2009)
Chen, X., Duan, Y., Chen, Z., Xu, H., Chen, Z., Liang, X., Zhang, T., Li, Z.: CATCH: context-based meta reinforcement learning for transferrable architecture search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 185–202. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_12
Yunhao, T., Tadashi, K., Mark, R., Rémi, M., Michalm, V.: Unifying gradient estimators for meta-reinforcement learning via off-policy evaluation. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Kate, R., Aurick, Z., Chelsea, F., Sergey, L., Deirdre, Q.: Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: International conference on machine learning, pp. 5331–5340. PMLR, (2019)
Huang, S.-T., Chen, S.: Transfer learning with active queries from source domain. In IJCAI, pp. 1592–1598 (2016)
Rothfuss, J., Lee, Clavera, I., Asfour, T., Abbeel, P.: Promp: Proximal meta-policy search. arXiv preprint arXiv:1810.06784 (2018)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Gurumurthy, S., Kumar, S., Sycara, K.: Mame: Model-agnostic meta-exploration. In Conference on Robot Learning, pp. 910–922. PMLR (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, J., Yan, L., Yu, X., Guan, Y. (2022). Contextual Policy Transfer in Meta-Reinforcement Learning via Active Learning. In: Zhao, X., Yang, S., Wang, X., Li, J. (eds) Web Information Systems and Applications. WISA 2022. Lecture Notes in Computer Science, vol 13579. Springer, Cham. https://doi.org/10.1007/978-3-031-20309-1_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-20309-1_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20308-4
Online ISBN: 978-3-031-20309-1
eBook Packages: Computer ScienceComputer Science (R0)