[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3692070.3694518guideproceedingsArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Tackling non-stationarity in reinforcement learning via causal-origin representation

Published: 21 July 2024 Publication History

Abstract

In real-world scenarios, the application of reinforcement learning is significantly challenged by complex non-stationarity. Most existing methods attempt to model changes in the environment explicitly, often requiring impractical prior knowledge of environments. In this paper, we propose a new perspective, positing that nonstationarity can propagate and accumulate through complex causal relationships during state transitions, thereby compounding its sophistication and affecting policy learning. We believe that this challenge can be more effectively addressed by implicitly tracing the causal origin of non-stationarity. To this end, we introduce the Causal-Origin REPresentation (COREP) algorithm. COREP primarily employs a guided updating mechanism to learn a stable graph representation for the state, termed as causal-origin representation. By leveraging this representation, the learned policy exhibits impressive resilience to non-stationarity. We supplement our approach with a theoretical analysis grounded in the causal interpretation for non-stationary reinforcement learning, advocating for the validity of the causal-origin representation. Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity problems. The code is available at https://github.com/PKURL/COREP.

References

[1]
Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., and Abbeel, P. Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641, 2017.
[2]
Alegre, L. N., Bazzan, A. L., and da Silva, B. C. Minimum-delay adaptation in non-stationary reinforcement learning via online high-confidence change-point detection. arXiv preprint arXiv:2105.09452, 2021.
[3]
Balcilar, M., Renton, G., Héroux, P., Gaüzère, B., Adam, S., and Honeine, P. Analyzing the expressive power of graph neural networks in a spectral perspective. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
[4]
Chandak, Y., Theocharous, G., Shankar, S., White, M., Mahadevan, S., and Thomas, P. Optimizing for the future in non-stationary mdps. In International Conference on Machine Learning, pp. 1414-1425. PMLR, 2020.
[5]
Chickering, D. M. Optimal structure identification with greedy search. Journal of machine learning research, 3 (Nov):507-554, 2002.
[6]
Choi, S., Yeung, D.-Y., and Zhang, N. An environment model for nonstationary reinforcement learning. Advances in neural information processing systems, 12, 1999.
[7]
Cussens, J., Haws, D., and Studenỳ, M. Polyhedral aspects of score equivalence in bayesian network structure learning. Mathematical Programming, 164:285-324, 2017.
[8]
Da Silva, B. C., Basso, E. W., Bazzan, A. L., and Engel, P. M. Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on Machine learning, pp. 217-224, 2006.
[9]
Deng, X., Zhang, Y., and Qi, H. Towards optimal hvac control in non-stationary building environments combining active change detection and deep reinforcement learning. Building and environment, 211:108680, 2022.
[10]
Feng, F., Huang, B., Zhang, K., and Magliacane, S. Factored adaptation for non-stationary reinforcement learning. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022.
[11]
Finn, C., Abbeel, P., and Levine, S. Model-agnostic metalearning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126-1135. PMLR, 2017.
[12]
Hadoux, E., Beynier, A., and Weng, P. Solving hidden-semi-markov-mode markov decision problems. In Scalable Uncertainty Management: 8th International Conference, SUM 2014, Oxford, UK, September 15-17, 2014. Proceedings 8, pp. 176-189. Springer, 2014.
[13]
Huang, B., Zhang, K., Lin, Y., Schölkopf, B., and Glymour, C. Generalized score functions for causal discovery. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1551-1560, 2018.
[14]
Huang, B., Zhang, K., Zhang, J., Ramsey, J., Sanchez-Romero, R., Glymour, C., and Schölkopf, B. Causal discovery from heterogeneous/nonstationary data. The Journal of Machine Learning Research, 21(1):3482-3534, 2020.
[15]
Huang, B., Feng, F., Lu, C., Magliacane, S., and Zhang, K. AdaRL: What, where, and how to adapt in transfer reinforcement learning. In International Conference on Learning Representations, 2022.
[16]
Kaelbling, L. P., Littman, M. L., and Moore, A. W. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237-285, 1996.
[17]
Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
[18]
Koivisto, M. and Sood, K. Exact bayesian structure discovery in bayesian networks. The Journal of Machine Learning Research, 5:549-573, 2004.
[19]
Lauritzen, S. L. Graphical models, volume 17. Clarendon Press, 1996.
[20]
Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J. W., Songhori, E., Wang, S., Lee, Y.-J., Johnson, E., Pathak, O., Nazi, A., et al. A graph placement methodology for fast chip design. Nature, 594(7862):207-212, 2020.
[21]
Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., and Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. arXiv preprint arXiv:1803.11347, 2018.
[22]
Padakandla, S. A survey of reinforcement learning algorithms for dynamically varying environments. ACM Computing Surveys (CSUR), 54(6):1-25, 2021.
[23]
Padakandla, S., KJ, P., and Bhatnagar, S. Reinforcement learning algorithm for non-stationary environments. Applied Intelligence, 50:3590-3606, 2020.
[24]
Pearl, J. et al. Models, reasoning and inference. Cambridge, UK: Cambridge University Press, 19(2), 2000.
[25]
Poiani, R., Tirinzoni, A., and Restelli, M. Metareinforcement learning by tracking task non-stationarity. In Zhou, Z.-H. (ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 2899-2905. International Joint Conferences on Artificial Intelligence Organization, 8 2021. Main Track.
[26]
Provan, G., Quinones-Grueiro, M., and Sohége, Y. Towards real-time robust adaptive control for non-stationary environments. IFAC-PapersOnLine, 55(6):73-78, 2022.
[27]
Richardson, T. and Spirtes, P. Ancestral graph markov models. The Annals of Statistics, 30(4):962-1030, 2002.
[28]
Sadeghi, K. Stable mixed graphs. Bernoulli, 19(5B):2330-2358, 2013.
[29]
Saeed, B., Panigrahi, S., and Uhler, C. Causal structure discovery from distributions arising from mixtures of dags. In International Conference on Machine Learning, pp. 8336-8345. PMLR, 2020.
[30]
Saengkyongam, S., Thams, N., Peters, J., and Pfister, N. Invariant policy learning: A causal perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[31]
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE transactions on neural networks, 20(1):61-80, 2008.
[32]
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[33]
Silander, T. and Myllymäki, P. A simple approach for finding the globally optimal bayesian network structure. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pp. 445-452, 2006.
[34]
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140-1144, 2018.
[35]
Sodhani, S., Meier, F., Pineau, J., and Zhang, A. Block contextual mdps for continual learning. In Learning for Dynamics and Control Conference, pp. 608-623. PMLR, 2022.
[36]
Spirtes, P., Meek, C., and Richardson, T. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pp. 499-506, 1995.
[37]
Spirtes, P., Glymour, C. N., and Scheines, R. Causation, prediction, and search. MIT press, 2000.
[38]
Strobl, E. V. Improved causal discovery from longitudinal data using a mixture of dags. In The 2019 ACM SIGKDD Workshop on Causal Discovery, pp. 100-133. PMLR, 2019.
[39]
Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018.
[40]
Sutton, R. S., Koop, A., and Silver, D. On the role of tracking in stationary environments. In Proceedings of the 24th international conference on Machine learning, pp. 871-878, 2007.
[41]
Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D. d. L., Budden, D., Abdolmaleki, A., Merel, J., Lefrancq, A., et al. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
[42]
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
[43]
Vowels, M. J., Camgoz, N. C., and Bowden, R. D'ya like dags? a survey on structure learning and causal discovery. ACM Computing Surveys, 55(4):1-36, 2022.
[44]
Xie, A., Harrison, J., and Finn, C. Deep reinforcement learning amidst lifelong non-stationarity. arXiv preprint arXiv:2006.10701, 2020.
[45]
Yu, Y., Chen, J., Gao, T., and Yu, M. Dag-gnn: Dag structure learning with graph neural networks. In International Conference on Machine Learning, pp. 7154-7163. PMLR, 2019.
[46]
Zhang, J. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172(16-17): 1873-1896, 2008.
[47]
Zhang, K., Gong, M., Stojanov, P., Huang, B., Liu, Q., and Glymour, C. Domain adaptation as a problem of inference on graphical models. Advances in neural information processing systems, 33:4965-4976, 2020.
[48]
Zhang, W. and Li, J. Stable weighted multiple model adaptive control of continuous-time plant with large parameter uncertainties. IEEE Access, 7:144125-144133, 2019.
[49]
Zheng, X., Aragam, B., Ravikumar, P. K., and Xing, E. P. Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.
[50]
Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., and Sun, M. Graph neural networks: A review of methods and applications. AI open, 1:57-81, 2020.
[51]
Zintgraf, L., Shiarlis, K., Igl, M., Schulze, S., Gal, Y., Hofmann, K., and Whiteson, S. Varibad: A very good method for bayes-adaptive deep rl via meta-learning. arXiv preprint arXiv:1910.08348, 2019.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'24: Proceedings of the 41st International Conference on Machine Learning
July 2024
63010 pages

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media