[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Active Inference for Stochastic Control

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)

Abstract

Active inference has emerged as an alternative approach to control problems given its intuitive (probabilistic) formalism. However, despite its theoretical utility, computational implementations have largely been restricted to low-dimensional, deterministic settings. This paper highlights that this is a consequence of the inability to adequately model stochastic transition dynamics, particularly when an extensive policy (i.e., action trajectory) space must be evaluated during planning. Fortunately, recent advancements propose a modified planning algorithm for finite temporal horizons. We build upon this work to assess the utility of active inference for a stochastic control setting. For this, we simulate the classic windy grid-world task with additional complexities, namely: 1) environment stochasticity; 2) learning of transition dynamics; and 3) partial observability. Our results demonstrate the advantage of using active inference, compared to reinforcement learning, in both deterministic and stochastic settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 103.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 129.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Here, outcomes introduce ambiguity for the agent as similar outcomes map to different (hidden) states. See Appendix B, Table 3 for implementation details.

  2. 2.

    First term in Eq. 5 does not contribute to solving the problem addressed in the paper. Here, C only accommodates preference to goal-state. However, for a more informed C i.e with preferences for immediate reward maximisation, the term will influence action selection.

  3. 3.

    The elements in C should be given a finite negligible value while implementation, to avoid divergence of \(D_{KL}\) terms in Eq. 4 and Eq. 5.

References

  1. Friston, K.: The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010)

    Article  Google Scholar 

  2. Kaplan, R., Friston, K.J.: Planning and navigation as active inference. Biol. Cybern. 112(4), 323–343 (2018). https://doi.org/10.1007/s00422-018-0753-2

    Article  MathSciNet  MATH  Google Scholar 

  3. Kuchling, F., Friston, K., Georgiev, G., Levin, M.: Morphogenesis as Bayesian inference: a variational approach to pattern formation and control in complex biological systems. Phys. Life Rev. 33, 88–108 (2019)

    Google Scholar 

  4. Oliver, G., Lanillos, P., Cheng, G.: Active inference body perception and action for humanoid robots. arXiv preprint arXiv:1906.03022 (2019)

  5. Rubin, S., Parr, T., Da Costa, L., Friston, K.: Future climates: Markov blankets and active inference in the biosphere. J. Royal Soc. Interface 17(172), 20200503 (2020)

    Google Scholar 

  6. Deane, G., Miller, M., Wilkinson, S.: Losing ourselves: active inference, depersonalization, and meditation. Front. Psychol. 11, 2893 (2020)

    Google Scholar 

  7. Friston, K.J., Daunizeau, J., Kiebel, S.J.: Reinforcement learning or active inference? PLOS ONE 4(7), e6421 (2009). https://doi.org/10.1371/journal.pone.0006421

  8. Friston, K., Samothrakis, S., Montague, R.: Active inference and agency: optimal control without cost functions. Biol. Cybern. 106(8), 523–541 (2012)

    Article  MathSciNet  Google Scholar 

  9. Sajid, N., Ball, P.J., Parr, T., Friston, K.J.: Active inference: demystified and compared Neural Comput. 33(3), 674–712 (2021)

    Google Scholar 

  10. Friston, K., Da Costa, L., Hafner, D., Hesp, C., Parr, T.: Sophisticated inference. Neural Comput. 33(3), 713–763 (2021)

    Article  MathSciNet  Google Scholar 

  11. Da Costa, L., Sajid, N., Parr, T., Friston, K., Smith, R.: The relationship between dynamic programming and active inference: the discrete, finite-horizon case. arXiv arXiv:2009.08111 (2020)

  12. Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. arXiv e-prints (2020)

    Google Scholar 

  13. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  14. Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)

    Article  MathSciNet  Google Scholar 

  15. Fountas, Z., Sajid, N., Mediano, P.A., Friston, K.: Deep active inference agents using Monte-Carlo methods. arXiv preprint arXiv:2006.04176 (2020)

  16. Çatal, O., Nauta, J., Verbelen, T., Simoens, P., Dhoedt, B.: Bayesian policy selection using active inference. arXiv preprint arXiv:1904.08149 (2019)

  17. van der Himst, O., Lanillos, P.: Deep active inference for partially observable MDPs. In: IWAI 2020. CCIS, vol. 1326, pp. 61–71. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64919-7_8

    Chapter  Google Scholar 

  18. Millidge, B., Tschantz, A., Seth, A.K., Buckley, C.L.: On the relationship between active inference and control as inference. In: IWAI 2020. CCIS, vol. 1326, pp. 3–11. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64919-7_1

    Chapter  Google Scholar 

Download references

Acknowledgments

AP acknowledges research sponsorship from IITB-Monash Research Academy, Mumbai and Department of Biotechnology, Government of India. AR is funded by the Australian Research Council (Refs: DE170100128 & DP200100757) and Australian National Health and Medical Research Council Investigator Grant (Ref: 1194910). AR is a CIFAR Azrieli Global Scholar in the Brain, Mind & Consciousness Program. AR and NS are affiliated with The Wellcome Centre for Human Neuroimaging supported by core funding from Wellcome [203147/Z/16/Z].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aswin Paul .

Editor information

Editors and Affiliations

Appendices

Supplementary Information

A Results Level-1 and Level-3 (Non-stochastic Settings)

Fig. 4.
figure 4

Performance comparison of agents in Level-1 of windy grid-world task. ‘RandomAgent’ refers to a naive-agent that takes all actions with equal probability at every time step.

Fig. 5.
figure 5

A: Performance comparison of active inference agents with learned B using 5000 and 10000 updates respectively to Q-Learning agent in Level-3. ‘Q-Learning5K’ stands for Q-Learning agent trained for 5000 time steps using 10 different random seeds. B: Accuracy of learned dynamics in terms of deviation from true dynamics.

B Outcome Modalities for POMDPs

In the partially observable setting, we considered two outcome modalities and both of them were the function of ‘side’ and ‘down’ coordinates defined for every state in Fig. 1. Examples of the coordinates and modalities are given below. First outcome modality is the sum of co-ordinates and second modality is the product of coordinates.

Table 3. Outcome modalities specifications

These outcome modalities are similar for many states (for e.g., states 2 and 11 have the same outcome modalities (see Table 3)). The results demonstrates the ability of active inference agent to perform optimal inference and planning in the face of ambiguity. One of the output from ‘SPM_MDP_VB_XX.m’ is ‘MDP.P’. ‘MDP.P’ returns the action probabilities an agent will use for a given POMDP as input at each time-step. This distribution was used to conduct multiple trails to evaluate success rate of the active inference agent.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Paul, A., Sajid, N., Gopalkrishnan, M., Razi, A. (2021). Active Inference for Stochastic Control. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93736-2_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93735-5

  • Online ISBN: 978-3-030-93736-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics