More Web Proxy on the site http://driver.im/

research-article

Causal reinforcement learning based on Bayesian networks applied to industrial settings

Authors:

Gabriel Valverde,

Pedro Larrañaga,

Concha BielzaAuthors Info & Claims

Volume 125, Issue C

https://doi.org/10.1016/j.engappai.2023.106657

Published: 01 October 2023 Publication History

Abstract

The increasing amount of real-time data collected from sensors in industrial environments has accelerated the application of machine learning in decision-making. Reinforcement learning (RL) is a powerful tool to find optimal policies for achieving a given goal. However, RL’s typical application is risky and insufficient in environments where actions can have irreversible consequences and require interpretability and fairness. While new trends in RL may provide guidance based on expert knowledge, they do not often consider uncertainty or include prior knowledge in the learning process. We propose a causal reinforcement learning alternative based on Bayesian networks (RLBNs) to address this challenge. The RLBN simultaneously models a policy and takes advantage of the joint distribution of the state and action space, reducing uncertainty in unknown situations. We propose a training algorithm for the network’s parameters and structure based on the reward function and likelihood of the effects and measurements taken. Our experiment with the CartPole benchmark and industrial fouling using ordinary differential equations (ODEs) demonstrates that RLBNs are interpretable, secure, flexible, and more robust than their competitors. Our contributions include a novel method that incorporates expert knowledge into the decision-making engine. It uses Bayesian networks with a predefined structure as a causal graph and a hybrid learning strategy that considers both likelihood and reward. This would avoid losing the virtues of the Bayesian network.

References

[1]

Atienza D., Larrañaga P., Bielza C., Hybrid semiparametric Bayesian networks, TEST 31 (2) (2022) 299–327.

[2]

Bai W., Li T., Tong S., NN reinforcement learning adaptive control for a class of nonstrict-feedback discrete-time systems, IEEE Trans. Cybern. 50 (11) (2020) 4573–4584.

[3]

Barto A.G., Sutton R.S., Anderson C.W., Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern. SMC-13 (5) (1983) 834–846.

[4]

Benjumeda M., Luengo-Sanchez S., Larrañaga P., Bielza C., Tractable learning of Bayesian networks from partially observed data, Pattern Recognit. 91 (2019) 190–199.

[5]

Bishop C.M., Nasrabadi N.M., Pattern Recognition and Machine Learning, Vol. 4, Springer, 2006.

[6]

Bott T.R., Fouling of Heat Exchangers, Elsevier, 1995.

[7]

Boyes H., Hallaq B., Cunningham J., Watson T., The industrial internet of things (IIoT): An analysis framework, Comput. Ind. 101 (2018) 1–12.

[8]

Brusakov V., Law for the deposition of materials on heat-transmitting surfaces under the action of thermoelectric effects, Atomnaya Energiyae 30 (1971) 10–14.

[9]

Chickering D.M., Learning Bayesian networks is NP-complete, Networks (1996) 121–130.

[10]

Chickering D.M., Learning Bayesian networks is NP-complete, in: Learning from Data: Artificial Intelligence and Statistics V, Springer, 1996, pp. 121–130.

[11]

Copisarow M., Marine fouling and its prevention, Science 101 (2625) (1945) 406–407.

[12]

Dawid P., Decision-theoretic foundations for statistical causality, J. Causal Inference 9 (2020) 39–77.

[13]

Du Y., Li J.-q., Chen X.-l., Duan P.-y., Pan Q.-k., Knowledge-based reinforcement learning and estimation of distribution algorithm for flexible job shop scheduling problem, IEEE Trans. Emerg. Top. Comput. Intell. (2022).

[14]

Gámez J., Mateo J., Puerta J., Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood, Data Min. Knowl. Discov. 22 (2011) 106–148.

[15]

Gershman S.J., Reinforcement learning and causal models, Oxf. Handb. Causal Reason. 1 (2017) 295.

[16]

Ghavamzadeh M., Mannor S., Pineau J., Tamar A., et al., Bayesian reinforcement learning: A survey, Found. Trends Mach. Learn. 8 (5–6) (2015) 359–483.

[17]

Haarnoja T., Zhou A., Abbeel P., Levine S., Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018, arXiv:1801.01290.

[18]

Harper C., Introduction to Mathematical Physics, Prentice hall, 1976.

[19]

Heckerman D., Geiger D., Chickering D., Learning Bayesian networks: The combination of knowledge and statistical data, Mach. Learn. 20 (1995) 197–243.

[20]

Ji Z., Xia Q., Meng G., A review of parameter learning methods in Bayesian network, in: Advanced Intelligent Computing Theories and Applications, Springer, 2015, pp. 3–12.

[21]

Koller D., Friedman N., Probabilistic Graphical Models: Principles and Techniques, The MIT Press, 2009.

Digital Library

[22]

Kullback S., Leibler R.A., On information and sufficiency, Ann. Math. Stat. 22 (1) (1951) 79–86.

[23]

Larrañaga P., Atienza D., Diaz-Rozo J., Ogbechie A., Puerto-Santana C., Bielza C., Industrial Applications of Machine Learning, CRC Press, 2018.

[24]

Lawal M.O., Tomato detection based on modified YOLOv3 framework, Sci. Rep. 11 (1) (2021) 1–11.

[25]

Lepenioti K., Pertselakis M., Bousdekis A., Louca A., Lampathaki F., Apostolou D., Mentzas G., Anastasiou S., Machine learning for predictive and prescriptive analytics of operational data in smart manufacturing, in: Advanced Information Systems Engineering Workshops, Springer, 2020, pp. 5–16.

[26]

Li C., Quiu M., Reinforcement Learning for Cyber-Physical Systems, Chapman and Hall CRC, 2020.

[27]

Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D., 2016. Continuous control with deep reinforcement learning. In: International Conference on Learning Representations. pp. 10–15.

[28]

Madigan D., York J., Allard D., Bayesian graphical models for discrete data, Int. Stat. Rev./Revue Int. Stat. (1995) 215–232.

[29]

McLachlan S., Dube K., Hitman G.A., Fenton N.E., Kyrimi E., Bayesian networks in healthcare: Distribution by medical condition, Artif. Intell. Med. 107 (2020).

[30]

Méndez-Molina A., Morales E.F., Sucar L.E., Causal discovery and reinforcement learning: A synergistic integration, in: International Conference on Probabilistic Graphical Models, PMLR, 2022, pp. 421–432.

[31]

Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K., 2016. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. Vol. 48. pp. 1928–1937.

[32]

Mnih V., Kavukcuoglu K., Silver D., Human-level control through deep reinforcement learning, Nature 518 (2015) 529–533.

[33]

Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., Wierstra D., Riedmiller M., Playing atari with deep reinforcement learning, 2013, arXiv preprint.

[34]

Nagendra S., Podila N., Ugarakhod R., George K., Comparison of reinforcement learning algorithms applied to the cart-pole problem, in: 2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI, IEEE, 2017, pp. 26–32.

[35]

Neapolitan R.E., et al., Learning Bayesian Networks, Vol. 38, Pearson Prentice Hall Upper Saddle River, 2004.

[36]

Pearl J., Fusion, propagation, and structuring in belief networks, Artif. Intell. 29 (1986) 241–288.

[37]

Pearl J., Causal diagrams for empirical research, Biometrika 82 (4) (1995) 669–688.

[38]

Quesada D., Bielza C., Fontán P., Larrañaga P., Piecewise forecasting of nonlinear time series with model tree dynamic Bayesian networks, Int. J. Intell. Syst. (2022).

[39]

Ramoni M., Sebastiani P., Robust learning with missing data, Mach. Learn. 45 (2001) 147–170.

[40]

Roy A.M., Adaptive transfer learning-based multiscale feature fused deep convolutional neural network for EEG MI multiclassification in brain–computer interface, Eng. Appl. Artif. Intell. 116 (2022).

Digital Library

[41]

Scanagatta M., Corani G., Zaffalon M., Yoo J., Kang U., Efficient learning of bounded-treewidth Bayesian networks from complete and incomplete data sets, Internat. J. Approx. Reason. 95 (2018) 152–166.

[42]

Schaal S., Learning from demonstration, in: Advances in Neural Information Processing Systems, Vol. 9, The MIT Press, 1996, pp. 10–15.

[43]

Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P., 2015. Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, Vol. 37. pp. 1889–1897.

[44]

Shachter R.D., Kenley C.R., Gaussian influence diagrams, Manage. Sci. 35 (5) (1989) 527–550.

[45]

Silva A., Gombolay M., Killian T., Jimenez I., Son S.-H., Optimization methods for interpretable differentiable decision trees applied to reinforcement learning, in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, in: Proceedings of Machine Learning Research, vol. 108, 2020, pp. 1855–1865.

[46]

Silver D., Hubert T., Schrittwieser J., Antonoglou I., Lai M., Guez A., Lanctot M., Sifre L., Kumaran D., Graepel T., et al., A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, Science 362 (6419) (2018) 1140–1144.

[47]

Song Y., Zhou Y., Sekhari A., Bagnell J.A., Krishnamurthy A., Sun W., Hybrid RL: Using both offline and online data can make RL efficient, 2022, arXiv preprint arXiv:2210.06718.

[48]

Spiegelhalter D.J., Lauritzen S.L., Sequential updating of conditional probabilities on directed graphical structures, Networks 20 (5) (1990) 579–605.

[49]

Sutton R., Barto A., Reinforcement Learning: An Introduction, The MIT Press, 1998.

Digital Library

[50]

Sutton R.S., McAllester D., Singh S., Mansour Y., Policy gradient methods for reinforcement learning with function approximation, in: Solla S., Leen T., Müller K. (Eds.), Advances in Neural Information Processing Systems, Vol. 12, The MIT Press, 1999, p. 1.

[51]

Tedrake, R., Zhang, T., Seung, H., 2004. Stochastic policy gradient reinforcement learning on a simple 3D biped. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. 3. pp. 2849–2854.

[52]

Treesatayapun C., Knowledge-based reinforcement learning controller with fuzzy-rule network: experimental validation, Neural Comput. Appl. 32 (13) (2020) 9761–9775.

[53]

Wang, Y., He, H., Tan, X., 2020. Truly proximal policy optimization. In: Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, Vol. 115. pp. 113–122.

[54]

Williams R.J., Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8 (3) (1992) 229–256.

[55]

Zhang Y., Lan Y., Fang Q., Xu X., Li J., Zeng Y., Efficient reinforcement learning from demonstration via Bayesian network-based knowledge extraction, Comput. Intell. Neurosci. 2021 (2021).

Cited By

Yang QParasuraman R(2024)Bayesian Strategy Networks Based Soft Actor-Critic LearningACM Transactions on Intelligent Systems and Technology10.1145/364386215:3(1-24)Online publication date: 29-Mar-2024
https://dl.acm.org/doi/10.1145/3643862
Khelifi ESaki AFaghihi U(2024)Causal Deep Q NetworksAdvances and Trends in Artificial Intelligence. Theory and Applications10.1007/978-981-97-4677-4_21(254-264)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1007/978-981-97-4677-4_21

Recommendations

Inductive transfer for learning Bayesian networks

In several domains it is common to have data from different, but closely related problems. For instance, in manufacturing, many products follow the same industrial process but with different conditions; or in industrial diagnosis, where there is ...
Variable-Agnostic Causal Exploration for Reinforcement Learning
Machine Learning and Knowledge Discovery in Databases. Research Track
Abstract
Modern reinforcement learning (RL) struggles to capture real-world cause-and-effect dynamics, leading to inefficient exploration due to extensive trial-and-error actions. While recent efforts to improve agent exploration have leveraged causal ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Engineering Applications of Artificial Intelligence

Engineering Applications of Artificial Intelligence Volume 125, Issue C

Oct 2023

1603 pages

ISSN:0952-1976

Issue’s Table of Contents

The Author(s).

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 October 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang QParasuraman R(2024)Bayesian Strategy Networks Based Soft Actor-Critic LearningACM Transactions on Intelligent Systems and Technology10.1145/364386215:3(1-24)Online publication date: 29-Mar-2024
https://dl.acm.org/doi/10.1145/3643862
Khelifi ESaki AFaghihi U(2024)Causal Deep Q NetworksAdvances and Trends in Artificial Intelligence. Theory and Applications10.1007/978-981-97-4677-4_21(254-264)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1007/978-981-97-4677-4_21

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents