More Web Proxy on the site http://driver.im/

research-article

Tackling non-stationarity in reinforcement learning via causal-origin representation

AUTHORs:

Zongqing LuAuthors Info & Claims

ICML'24: Proceedings of the 41st International Conference on Machine Learning

Article No.: 2448, Pages 59264 - 59288

Published: 21 July 2024 Publication History

Abstract

In real-world scenarios, the application of reinforcement learning is significantly challenged by complex non-stationarity. Most existing methods attempt to model changes in the environment explicitly, often requiring impractical prior knowledge of environments. In this paper, we propose a new perspective, positing that nonstationarity can propagate and accumulate through complex causal relationships during state transitions, thereby compounding its sophistication and affecting policy learning. We believe that this challenge can be more effectively addressed by implicitly tracing the causal origin of non-stationarity. To this end, we introduce the Causal-Origin REPresentation (COREP) algorithm. COREP primarily employs a guided updating mechanism to learn a stable graph representation for the state, termed as causal-origin representation. By leveraging this representation, the learned policy exhibits impressive resilience to non-stationarity. We supplement our approach with a theoretical analysis grounded in the causal interpretation for non-stationary reinforcement learning, advocating for the validity of the causal-origin representation. Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity problems. The code is available at https://github.com/PKURL/COREP.

References

[1]

Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., and Abbeel, P. Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641, 2017.

[2]

Alegre, L. N., Bazzan, A. L., and da Silva, B. C. Minimum-delay adaptation in non-stationary reinforcement learning via online high-confidence change-point detection. arXiv preprint arXiv:2105.09452, 2021.

[3]

Balcilar, M., Renton, G., Héroux, P., Gaüzère, B., Adam, S., and Honeine, P. Analyzing the expressive power of graph neural networks in a spectral perspective. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.

[4]

Chandak, Y., Theocharous, G., Shankar, S., White, M., Mahadevan, S., and Thomas, P. Optimizing for the future in non-stationary mdps. In International Conference on Machine Learning, pp. 1414-1425. PMLR, 2020.

[5]

Chickering, D. M. Optimal structure identification with greedy search. Journal of machine learning research, 3 (Nov):507-554, 2002.

[6]

Choi, S., Yeung, D.-Y., and Zhang, N. An environment model for nonstationary reinforcement learning. Advances in neural information processing systems, 12, 1999.

[7]

Cussens, J., Haws, D., and Studenỳ, M. Polyhedral aspects of score equivalence in bayesian network structure learning. Mathematical Programming, 164:285-324, 2017.

Digital Library

[8]

Da Silva, B. C., Basso, E. W., Bazzan, A. L., and Engel, P. M. Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on Machine learning, pp. 217-224, 2006.

Digital Library

[9]

Deng, X., Zhang, Y., and Qi, H. Towards optimal hvac control in non-stationary building environments combining active change detection and deep reinforcement learning. Building and environment, 211:108680, 2022.

[10]

Feng, F., Huang, B., Zhang, K., and Magliacane, S. Factored adaptation for non-stationary reinforcement learning. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022.

[11]

Finn, C., Abbeel, P., and Levine, S. Model-agnostic metalearning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126-1135. PMLR, 2017.

[12]

Hadoux, E., Beynier, A., and Weng, P. Solving hidden-semi-markov-mode markov decision problems. In Scalable Uncertainty Management: 8th International Conference, SUM 2014, Oxford, UK, September 15-17, 2014. Proceedings 8, pp. 176-189. Springer, 2014.

Digital Library

[13]

Huang, B., Zhang, K., Lin, Y., Schölkopf, B., and Glymour, C. Generalized score functions for causal discovery. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1551-1560, 2018.

Digital Library

[14]

Huang, B., Zhang, K., Zhang, J., Ramsey, J., Sanchez-Romero, R., Glymour, C., and Schölkopf, B. Causal discovery from heterogeneous/nonstationary data. The Journal of Machine Learning Research, 21(1):3482-3534, 2020.

Digital Library

[15]

Huang, B., Feng, F., Lu, C., Magliacane, S., and Zhang, K. AdaRL: What, where, and how to adapt in transfer reinforcement learning. In International Conference on Learning Representations, 2022.

[16]

Kaelbling, L. P., Littman, M. L., and Moore, A. W. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237-285, 1996.

[17]

Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

[18]

Koivisto, M. and Sood, K. Exact bayesian structure discovery in bayesian networks. The Journal of Machine Learning Research, 5:549-573, 2004.

Digital Library

[19]

Lauritzen, S. L. Graphical models, volume 17. Clarendon Press, 1996.

[20]

Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J. W., Songhori, E., Wang, S., Lee, Y.-J., Johnson, E., Pathak, O., Nazi, A., et al. A graph placement methodology for fast chip design. Nature, 594(7862):207-212, 2020.

[21]

Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., and Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. arXiv preprint arXiv:1803.11347, 2018.

[22]

Padakandla, S. A survey of reinforcement learning algorithms for dynamically varying environments. ACM Computing Surveys (CSUR), 54(6):1-25, 2021.

[23]

Padakandla, S., KJ, P., and Bhatnagar, S. Reinforcement learning algorithm for non-stationary environments. Applied Intelligence, 50:3590-3606, 2020.

Digital Library

[24]

Pearl, J. et al. Models, reasoning and inference. Cambridge, UK: Cambridge University Press, 19(2), 2000.

[25]

Poiani, R., Tirinzoni, A., and Restelli, M. Metareinforcement learning by tracking task non-stationarity. In Zhou, Z.-H. (ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 2899-2905. International Joint Conferences on Artificial Intelligence Organization, 8 2021. Main Track.

[26]

Provan, G., Quinones-Grueiro, M., and Sohége, Y. Towards real-time robust adaptive control for non-stationary environments. IFAC-PapersOnLine, 55(6):73-78, 2022.

[27]

Richardson, T. and Spirtes, P. Ancestral graph markov models. The Annals of Statistics, 30(4):962-1030, 2002.

[28]

Sadeghi, K. Stable mixed graphs. Bernoulli, 19(5B):2330-2358, 2013.

[29]

Saeed, B., Panigrahi, S., and Uhler, C. Causal structure discovery from distributions arising from mixtures of dags. In International Conference on Machine Learning, pp. 8336-8345. PMLR, 2020.

[30]

Saengkyongam, S., Thams, N., Peters, J., and Pfister, N. Invariant policy learning: A causal perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.

Digital Library

[31]

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE transactions on neural networks, 20(1):61-80, 2008.

[32]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

[33]

Silander, T. and Myllymäki, P. A simple approach for finding the globally optimal bayesian network structure. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pp. 445-452, 2006.

Digital Library

[34]

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140-1144, 2018.

[35]

Sodhani, S., Meier, F., Pineau, J., and Zhang, A. Block contextual mdps for continual learning. In Learning for Dynamics and Control Conference, pp. 608-623. PMLR, 2022.

[36]

Spirtes, P., Meek, C., and Richardson, T. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pp. 499-506, 1995.

Digital Library

[37]

Spirtes, P., Glymour, C. N., and Scheines, R. Causation, prediction, and search. MIT press, 2000.

[38]

Strobl, E. V. Improved causal discovery from longitudinal data using a mixture of dags. In The 2019 ACM SIGKDD Workshop on Causal Discovery, pp. 100-133. PMLR, 2019.

[39]

Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018.

Digital Library

[40]

Sutton, R. S., Koop, A., and Silver, D. On the role of tracking in stationary environments. In Proceedings of the 24th international conference on Machine learning, pp. 871-878, 2007.

Digital Library

[41]

Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D. d. L., Budden, D., Abdolmaleki, A., Merel, J., Lefrancq, A., et al. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.

[42]

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.

[43]

Vowels, M. J., Camgoz, N. C., and Bowden, R. D'ya like dags? a survey on structure learning and causal discovery. ACM Computing Surveys, 55(4):1-36, 2022.

Digital Library

[44]

Xie, A., Harrison, J., and Finn, C. Deep reinforcement learning amidst lifelong non-stationarity. arXiv preprint arXiv:2006.10701, 2020.

[45]

Yu, Y., Chen, J., Gao, T., and Yu, M. Dag-gnn: Dag structure learning with graph neural networks. In International Conference on Machine Learning, pp. 7154-7163. PMLR, 2019.

[46]

Zhang, J. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172(16-17): 1873-1896, 2008.

Digital Library

[47]

Zhang, K., Gong, M., Stojanov, P., Huang, B., Liu, Q., and Glymour, C. Domain adaptation as a problem of inference on graphical models. Advances in neural information processing systems, 33:4965-4976, 2020.

[48]

Zhang, W. and Li, J. Stable weighted multiple model adaptive control of continuous-time plant with large parameter uncertainties. IEEE Access, 7:144125-144133, 2019.

[49]

Zheng, X., Aragam, B., Ravikumar, P. K., and Xing, E. P. Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.

[50]

Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., and Sun, M. Graph neural networks: A review of methods and applications. AI open, 1:57-81, 2020.

[51]

Zintgraf, L., Shiarlis, K., Igl, M., Schulze, S., Gal, Y., Hofmann, K., and Whiteson, S. Varibad: A very good method for bayes-adaptive deep rl via meta-learning. arXiv preprint arXiv:1910.08348, 2019.

Index Terms

Tackling non-stationarity in reinforcement learning via causal-origin representation

Index terms have been assigned to the content through auto-classification.

Recommendations

Causal representation learning via counterfactual intervention
AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

Existing causal representation learning methods are based on the causal graph they build. However, due to the omission of bias within the causal graph, they essentially encourage models to learn biased causal effects in latent space. In this paper, we ...
Tackling Non-stationarity in Decentralized Multi-Agent Reinforcement Learning with Prudent Q-Learning
Web Information Systems and Applications
Abstract
Multi-Agent Reinforcement Learning (MARL) is challenging due to the non-stationary issue of an agent’s learning environment caused by multiple co-evolving agents, i.e., the uncertainty rises with multiple agents learning and evolving ...
Variable-Agnostic Causal Exploration for Reinforcement Learning
Machine Learning and Knowledge Discovery in Databases. Research Track
Abstract
Modern reinforcement learning (RL) struggles to capture real-world cause-and-effect dynamics, leading to inefficient exploration due to extensive trial-and-error actions. While recent efforts to improve agent exploration have leveraged causal ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'24: Proceedings of the 41st International Conference on Machine Learning

July 2024

63010 pages

Copyright © 2024.

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

Research-article
Research
Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten