[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Offline Causal Imitation Learning with Latent Confounders

  • Conference paper
  • First Online:
Cognitive Computation and Systems (ICCCS 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1732))

Included in the following conference series:

  • 532 Accesses

Abstract

Learning an imitating policy offline from the expert’s demonstrations is prone to be a significant yet challenging problem. Despite great success, most methods assume that the data are uncorrupted with no latent confounders. However, such unobserved confounders could appear in many real-world applications, resulting in sub-optimal policies. Thus, in this paper, we propose an integrated two-stage algorithm to conduct the task of offline causal imitation learning, allowing the existence of latent confouders. In Stage 1, we aim at determining whether these latent variables are present or not, embracing a causal discovery method based on the conditional independence tests. In Stage 2, we adopt behavioral cloning or a variant of instrumental variable regression method for both the confounded and unconfounded cases, to eliminate the possible confounding influences. Experiments on the robotic arm control task verified the efficacy performances in both confounded and unconfounded situations.

S. Huang and Y. Zeng—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 55.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)

    Article  Google Scholar 

  2. Aytar, Y., Pfaff, T., Budden, D., Paine, T., Wang, Z., De Freitas, N.: Playing hard exploration games by watching youTube. In: Advances in Neural Information Processing Systems 31 (2018)

    Google Scholar 

  3. Bain, M., Sammut, C.: A framework for behavioural cloning. In: Machine Intelligence 15, pp. 103–129 (1995)

    Google Scholar 

  4. Bareinboim, E., Forney, A., Pearl, J.: Bandits with unobserved confounders: a causal approach. In: Advances in Neural Information Processing Systems 28 (2015)

    Google Scholar 

  5. Cai, R., Xie, F., Glymour, C., Hao, Z., Zhang, K.: Triad constraints for learning causal structure of latent variables. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  6. Chen, W., Cai, R., Zhang, K., Hao, Z.: Causal discovery in linear non-gaussian acyclic model with multiple latent confounders. In: IEEE Transactions on Neural Networks and Learning Systems (2021)

    Google Scholar 

  7. Codevilla, F., Müller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4693–4700. IEEE (2018)

    Google Scholar 

  8. Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9329–9338 (2019)

    Google Scholar 

  9. De Haan, P., Jayaraman, D., Levine, S.: Causal confusion in imitation learning. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  10. Dikkala, N., Lewis, G., Mackey, L., Syrgkanis, V.: Minimax estimation of conditional moment models. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12248–12262 (2020)

    Google Scholar 

  11. Ding, Y., Florensa, C., Abbeel, P., Phielipp, M.: Goal-conditioned imitation learning. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  12. Edwards, A., Sahni, H., Schroecker, Y., Isbell, C.: Imitating latent policies from observation. In: International Conference on Machine Learning, pp. 1755–1763. PMLR (2019)

    Google Scholar 

  13. Entner, D., Hoyer, P.O.: On causal discovery from time series data using FCI. In: Probabilistic Graphical Models, pp. 121–128 (2010)

    Google Scholar 

  14. Gerhardus, A., Runge, J.: High-recall causal discovery for autocorrelated time series with latent confounders. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12615–12625 (2020)

    Google Scholar 

  15. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems 29 (2016)

    Google Scholar 

  16. Hoyer, P.O., Shimizu, S., Kerminen, A.J., Palviainen, M.: Estimation of causal effects using linear non-gaussian causal models with hidden variables. Int. J. Approximate Reasoning 49(2), 362–378 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)

    Article  Google Scholar 

  18. Hyvärinen, A., Shimizu, S., Hoyer, P.O.: Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-gaussianity. In: Proceedings of the 25th International Conference on Machine Learning, pp. 424–431 (2008)

    Google Scholar 

  19. Kumor, D., Zhang, J., Bareinboim, E.: Sequential causal imitation learning with unobserved confounders. In: Advances in Neural Information Processing Systems, vol. 34, pp. 14669–14680 (2021)

    Google Scholar 

  20. Li, J., Luo, Y., Zhang, X.: Causal reinforcement learning: an instrumental variable approach. Available at SSRN 3792824 (2021)

    Google Scholar 

  21. Liao, L., Fu, Z., Yang, Z., Wang, Y., Kolar, M., Wang, Z.: Instrumental variable value iteration for causal offline reinforcement learning. arXiv preprint arXiv:2102.09907 (2021)

  22. Malinsky, D., Spirtes, P.: Causal structure learning from multivariate time series in settings with unmeasured confounding. In: Proceedings of 2018 ACM SIGKDD Workshop on Causal Discovery, pp. 23–47. PMLR (2018)

    Google Scholar 

  23. Niekum, S., Osentoski, S., Konidaris, G., Chitta, S., Marthi, B., Barto, A.G.: Learning grounded finite-state representations from unstructured demonstrations. Int. J. Robot. Res. 34(2), 131–157 (2015)

    Article  Google Scholar 

  24. Peters, J., Janzing, D., Schölkopf, B.: Causal inference on time series using restricted structural equation models. In: Advances in Neural Information Processing Systems 26 (2013)

    Google Scholar 

  25. Pomerleau, D.A.: Efficient training of artificial neural networks for autonomous navigation. Neural Comput. 3(1), 88–97 (1991)

    Article  Google Scholar 

  26. Ratliff, N., Bagnell, J.A., Srinivasa, S.S.: Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp. 392–397. IEEE (2007)

    Google Scholar 

  27. Salehkaleybar, S., Ghassami, A., Kiyavash, N., Zhang, K.: Learning linear non-gaussian causal models in the presence of latent variables. J. Mach. Learn. Res. 21, 39–1 (2020)

    MathSciNet  MATH  Google Scholar 

  28. Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, prediction, and search. MIT press (2000)

    Google Scholar 

  29. Spirtes, P., Meek, C., Richardson, T.: Causal inference in the presence of latent variables and selection bias. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 499–506 (1995)

    Google Scholar 

  30. Sun, W., Venkatraman, A., Gordon, G.J., Boots, B., Bagnell, J.A.: Deeply aggrevated: differentiable imitation learning for sequential prediction. In: International Conference on Machine Learning, pp. 3309–3318. PMLR (2017)

    Google Scholar 

  31. Swamy, G., Choudhury, S., Bagnell, J.A., Wu, Z.S.: Causal imitation learning under temporally correlated noise. arXiv preprint arXiv:2202.01312 (2022)

  32. Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation. arXiv preprint arXiv:1805.01954 (2018)

  33. Weichwald, S., et al.: Learning by doing: controlling a dynamical system using causality, control, and reinforcement learning. arXiv preprint arXiv:2202.06052 (2022)

  34. Zeng, Y., Shimizu, S., Cai, R., Xie, F., Yamamoto, M., Hao, Z.: Causal discovery with multi-domain lingam for latent factors. In: 30th International Joint Conference on Artificial Intelligence, IJCAI 2021, pp. 2097–2103. International Joint Conferences on Artificial Intelligence (2021)

    Google Scholar 

  35. Zhang, J., Kumor, D., Bareinboim, E.: Causal imitation learning with unobserved confounders. Adv. Neural. Inf. Process. Syst. 33, 12263–12274 (2020)

    Google Scholar 

  36. Zhang, T., et al.: Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 5628–5635. IEEE (2018)

    Google Scholar 

  37. Zheng, B., Verma, S., Zhou, J., Tsang, I., Chen, F.: Imitation learning: Progress, taxonomies and opportunities. arXiv preprint arXiv:2106.12177 (2021)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruichu Cai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, S., Zeng, Y., Cai, R., Hao, Z., Sun, F. (2023). Offline Causal Imitation Learning with Latent Confounders. In: Sun, F., Li, J., Liu, H., Chu, Z. (eds) Cognitive Computation and Systems. ICCCS 2022. Communications in Computer and Information Science, vol 1732. Springer, Singapore. https://doi.org/10.1007/978-981-99-2789-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-2789-0_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-2788-3

  • Online ISBN: 978-981-99-2789-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics