Abstract
Learning an imitating policy offline from the expert’s demonstrations is prone to be a significant yet challenging problem. Despite great success, most methods assume that the data are uncorrupted with no latent confounders. However, such unobserved confounders could appear in many real-world applications, resulting in sub-optimal policies. Thus, in this paper, we propose an integrated two-stage algorithm to conduct the task of offline causal imitation learning, allowing the existence of latent confouders. In Stage 1, we aim at determining whether these latent variables are present or not, embracing a causal discovery method based on the conditional independence tests. In Stage 2, we adopt behavioral cloning or a variant of instrumental variable regression method for both the confounded and unconfounded cases, to eliminate the possible confounding influences. Experiments on the robotic arm control task verified the efficacy performances in both confounded and unconfounded situations.
S. Huang and Y. Zeng—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Aytar, Y., Pfaff, T., Budden, D., Paine, T., Wang, Z., De Freitas, N.: Playing hard exploration games by watching youTube. In: Advances in Neural Information Processing Systems 31 (2018)
Bain, M., Sammut, C.: A framework for behavioural cloning. In: Machine Intelligence 15, pp. 103–129 (1995)
Bareinboim, E., Forney, A., Pearl, J.: Bandits with unobserved confounders: a causal approach. In: Advances in Neural Information Processing Systems 28 (2015)
Cai, R., Xie, F., Glymour, C., Hao, Z., Zhang, K.: Triad constraints for learning causal structure of latent variables. In: Advances in Neural Information Processing Systems 32 (2019)
Chen, W., Cai, R., Zhang, K., Hao, Z.: Causal discovery in linear non-gaussian acyclic model with multiple latent confounders. In: IEEE Transactions on Neural Networks and Learning Systems (2021)
Codevilla, F., Müller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4693–4700. IEEE (2018)
Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9329–9338 (2019)
De Haan, P., Jayaraman, D., Levine, S.: Causal confusion in imitation learning. In: Advances in Neural Information Processing Systems 32 (2019)
Dikkala, N., Lewis, G., Mackey, L., Syrgkanis, V.: Minimax estimation of conditional moment models. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12248–12262 (2020)
Ding, Y., Florensa, C., Abbeel, P., Phielipp, M.: Goal-conditioned imitation learning. In: Advances in Neural Information Processing Systems 32 (2019)
Edwards, A., Sahni, H., Schroecker, Y., Isbell, C.: Imitating latent policies from observation. In: International Conference on Machine Learning, pp. 1755–1763. PMLR (2019)
Entner, D., Hoyer, P.O.: On causal discovery from time series data using FCI. In: Probabilistic Graphical Models, pp. 121–128 (2010)
Gerhardus, A., Runge, J.: High-recall causal discovery for autocorrelated time series with latent confounders. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12615–12625 (2020)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems 29 (2016)
Hoyer, P.O., Shimizu, S., Kerminen, A.J., Palviainen, M.: Estimation of causal effects using linear non-gaussian causal models with hidden variables. Int. J. Approximate Reasoning 49(2), 362–378 (2008)
Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)
Hyvärinen, A., Shimizu, S., Hoyer, P.O.: Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-gaussianity. In: Proceedings of the 25th International Conference on Machine Learning, pp. 424–431 (2008)
Kumor, D., Zhang, J., Bareinboim, E.: Sequential causal imitation learning with unobserved confounders. In: Advances in Neural Information Processing Systems, vol. 34, pp. 14669–14680 (2021)
Li, J., Luo, Y., Zhang, X.: Causal reinforcement learning: an instrumental variable approach. Available at SSRN 3792824 (2021)
Liao, L., Fu, Z., Yang, Z., Wang, Y., Kolar, M., Wang, Z.: Instrumental variable value iteration for causal offline reinforcement learning. arXiv preprint arXiv:2102.09907 (2021)
Malinsky, D., Spirtes, P.: Causal structure learning from multivariate time series in settings with unmeasured confounding. In: Proceedings of 2018 ACM SIGKDD Workshop on Causal Discovery, pp. 23–47. PMLR (2018)
Niekum, S., Osentoski, S., Konidaris, G., Chitta, S., Marthi, B., Barto, A.G.: Learning grounded finite-state representations from unstructured demonstrations. Int. J. Robot. Res. 34(2), 131–157 (2015)
Peters, J., Janzing, D., Schölkopf, B.: Causal inference on time series using restricted structural equation models. In: Advances in Neural Information Processing Systems 26 (2013)
Pomerleau, D.A.: Efficient training of artificial neural networks for autonomous navigation. Neural Comput. 3(1), 88–97 (1991)
Ratliff, N., Bagnell, J.A., Srinivasa, S.S.: Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp. 392–397. IEEE (2007)
Salehkaleybar, S., Ghassami, A., Kiyavash, N., Zhang, K.: Learning linear non-gaussian causal models in the presence of latent variables. J. Mach. Learn. Res. 21, 39–1 (2020)
Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, prediction, and search. MIT press (2000)
Spirtes, P., Meek, C., Richardson, T.: Causal inference in the presence of latent variables and selection bias. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 499–506 (1995)
Sun, W., Venkatraman, A., Gordon, G.J., Boots, B., Bagnell, J.A.: Deeply aggrevated: differentiable imitation learning for sequential prediction. In: International Conference on Machine Learning, pp. 3309–3318. PMLR (2017)
Swamy, G., Choudhury, S., Bagnell, J.A., Wu, Z.S.: Causal imitation learning under temporally correlated noise. arXiv preprint arXiv:2202.01312 (2022)
Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation. arXiv preprint arXiv:1805.01954 (2018)
Weichwald, S., et al.: Learning by doing: controlling a dynamical system using causality, control, and reinforcement learning. arXiv preprint arXiv:2202.06052 (2022)
Zeng, Y., Shimizu, S., Cai, R., Xie, F., Yamamoto, M., Hao, Z.: Causal discovery with multi-domain lingam for latent factors. In: 30th International Joint Conference on Artificial Intelligence, IJCAI 2021, pp. 2097–2103. International Joint Conferences on Artificial Intelligence (2021)
Zhang, J., Kumor, D., Bareinboim, E.: Causal imitation learning with unobserved confounders. Adv. Neural. Inf. Process. Syst. 33, 12263–12274 (2020)
Zhang, T., et al.: Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 5628–5635. IEEE (2018)
Zheng, B., Verma, S., Zhou, J., Tsang, I., Chen, F.: Imitation learning: Progress, taxonomies and opportunities. arXiv preprint arXiv:2106.12177 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Huang, S., Zeng, Y., Cai, R., Hao, Z., Sun, F. (2023). Offline Causal Imitation Learning with Latent Confounders. In: Sun, F., Li, J., Liu, H., Chu, Z. (eds) Cognitive Computation and Systems. ICCCS 2022. Communications in Computer and Information Science, vol 1732. Springer, Singapore. https://doi.org/10.1007/978-981-99-2789-0_19
Download citation
DOI: https://doi.org/10.1007/978-981-99-2789-0_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2788-3
Online ISBN: 978-981-99-2789-0
eBook Packages: Computer ScienceComputer Science (R0)