[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Input addition and deletion in reinforcement: towards protean learning

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Reinforcement Learning (RL) agents are commonly thought of as adaptive decision procedures. They work on input/output data streams called “states”, “actions” and “rewards”. Most current research about RL adaptiveness to changes works under the assumption that the streams signatures (i.e. arity and types of inputs and outputs) remain the same throughout the agent lifetime. As a consequence, natural situations where the signatures vary (e.g. when new data streams become available, or when others become obsolete) are not studied. In this paper, we relax this assumption and consider that signature changes define a new learning situation called Protean Learning (PL). When they occur, traditional RL agents become undefined, so they need to restart learning. Can better methods be developed under the PL view? To investigate this, we first construct a stream-oriented formalism to properly define PL and signature changes. Then, we run experiments in an idealized PL situation where input addition and deletion occur during the learning process. Results show that a simple PL-oriented method enables graceful adaptation of these arity changes, and is more efficient than restarting the process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Abramowitz, M., & Stegun, I.A. (1972). Lengendre functions. In: Handbook of mathematical functions: With formulas, graphs, and mathematical Tables, 9th edn., (pp. 771–802).

  2. Bonnici, I. (2021). Towards protean learning: Accommodating signature changes in artificial agents. Ph.D Thesis, Université de Montpellier

  3. Bonnici, I., Gouaïch, A., & Michel, F. (2019). Effects of input addition in learning for adaptive games: Towards learning with structural changes. In: EvoApplications: Applications of evolutionary computation, vol. LNCS, (pp. 172–184). https://doi.org/10.1007/978-3-030-16692-2_12

  4. Busto, P.P., & Gall, J. (2017). Open set domain adaptation. In:International Conference on Computer Vision (ICCV), (pp. 754–763). IEEE https://doi.org/10.1109/ICCV.2017.88

  5. Caruana, R. (1994). Learning many related tasks at the same time with backpropagation. In: 7th International Conference on Neural Information Processing Systems. NIPS’94, (pp. 657–664). MIT Press.

  6. Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y., & Bahdanau, D. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Conference on Empirical methods in natural language processing (EMNLP), (pp. 1724–1734). Association for Computational Linguistics, Doha, Qatar. https://doi.org/10.3115/v1/D14-1179

  7. Cui, Y., Ahmad, S., & Hawkins, J. (2016). Continuous online sequence learning with an unsupervised neural network model. Neural Computation, 28(11), 2474–2504. https://doi.org/10.1162/NECO_a_00893.

    Article  MathSciNet  MATH  Google Scholar 

  8. De Rosario-Martinez, H. (2015). Phia: Post-hoc interaction analysis.

  9. Devin, C., Gupta, A., Darrell, T., Abbeel, P., & Levine, S. (2017). Learning modular neural network policies for multi-task and multi-robot transfer. In: International Conference on Robotics and Automation (ICRA), (pp. 2169–2176). IEEEhttps://doi.org/10.1109/ICRA.2017.7989250

  10. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. https://doi.org/10.1016/0364-0213(90)90002-E.

    Article  Google Scholar 

  11. Frans, K., Ho, J., Chen, X., Abbeel, P., & Schulman, J. (2018). Meta learning shared hierarchies. arXiv arXiv:abs/1710.09767.

  12. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 1–37. https://doi.org/10.1145/2523813.

    Article  MATH  Google Scholar 

  13. Ge, L., Gao, J., Ngo, H., Li, K., & Zhang, A. (2014). On handling negative transfer and imbalanced distributions in multiple source transfer learning: Multiple Source Transfer Learning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(4), 254–271. https://doi.org/10.1002/sam.11217.

    Article  MATH  Google Scholar 

  14. Gu, S., Lillicrap, T., Sutskever, I., & Levine, S. (2016). Continuous deep Q-Learning with model-based acceleration. In: M.F. Balcan, K.Q. Weinberger (eds.) 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 48, pp. 2829–2838. PMLR, New York, New York, USA.

  15. Gupta, A. K., Smith, K. G., & Shalley, C. E. (2006). The interplay between exploration and exploitation. Academy of Management Journal, 49(4), 693–706. https://doi.org/10.5465/amj.2006.22083026.

    Article  Google Scholar 

  16. Hanna, C.J., Hickey, R.J., Charles, D.K., & Black, M.M. (2010). Modular reinforcement learning architectures for artificially intelligent agents in complex game environments. In: Symposium on Computational Intelligence and Games (CIG), pp. 380–387. IEEE, Copenhagen, Denmark. https://doi.org/10.1109/ITW.2010.5593329

  17. Harel, M., & Mannor, S. (2011). Learning from multiple outlooks. In: 28th International Conference on International Conference on Machine Learning, ICML’11, (pp. 401–408). Omnipress.

  18. Heng, Wang., & Abraham, Z. (2015). Concept drift detection for streaming data. In: 2015 International Joint Conference on Neural Networks (IJCNN), (pp. 1–9). IEEE, Killarney, Ireland. https://doi.org/10.1109/IJCNN.2015.7280398

  19. Hinton, G. E., & Sejnowski, T. J. (Eds.). (1999). Unsupervised learning: Foundations of neural computation. Computational Neuroscience. MIT Press.

    Google Scholar 

  20. Jones, M. C., Marron, J. S., & Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91(433), 401–407. https://doi.org/10.1080/01621459.1996.10476701.

    Article  MathSciNet  MATH  Google Scholar 

  21. Kaplanis, C., Shanahan, M., & Clopath, C. (2018). Continual reinforcement learning with complex synapses. In: J. Dy, A. Krause (eds.) 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, (pp. 2497–2506). PMLR.

  22. Khanıev, T. A., Unver, İ, & Maden, S. (2001). On the semi-Markovian random walk with two reflecting barriers. Stochastic Analysis and Applications, 19(5), 799–819. https://doi.org/10.1081/SAP-120000222.

    Article  MathSciNet  MATH  Google Scholar 

  23. Kingma, D.P., Ba, J. (2015). Adam: A method for stochastic optimization. In: Y. Bengio, Y. LeCun (eds.) 3rd International Conference on Learning Representations (ICLR).

  24. Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526. https://doi.org/10.1073/pnas.1611835114.

    Article  MathSciNet  MATH  Google Scholar 

  25. Lazaric, A. (2012). Transfer in reinforcement learning: A framework and a survey. In: M. Wiering, M. van Otterlo (eds.) Reinforcement Learning, vol. 12, (pp. 143–173). Springer.

  26. Li, W., Duan, L., Xu, D., & Tsang, I. W. (2014). Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1134–1148.

    Article  Google Scholar 

  27. Lillicrap, T.P., Hunt, J.J., er Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv e-prints p. arXiv:1509.02971

  28. Losing, V., Hammer, B., & Wersing, H. (2018). Incremental on-line learning: A review and comparison of state of the art algorithms. Neurocomputing, 275, 1261–1274. https://doi.org/10.1016/j.neucom.2017.06.084.

    Article  Google Scholar 

  29. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236.

    Article  Google Scholar 

  30. Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning. Adaptive Computation and Machine Learning Series. MIT Press.

    MATH  Google Scholar 

  31. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in PyTorch.

  32. R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.

  33. Ring, M.B. (1994). Continual learning in reinforcement environments. Ph.D Thesis, University of Texas at Austin.

  34. Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., & Hadsell, R. (2016). Progressive neural networks. CoRR arXiv:abs/1606.04671

  35. Saad, D. (1998). (ed.): On-line learning in neural networks. Publications of the Newton Institute. Cambridge University Press.

  36. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. https://doi.org/10.1016/j.neunet.2014.09.003.

    Article  Google Scholar 

  37. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–503.

    Article  Google Scholar 

  38. Sodhani, S., Chandar, S., & Bengio, Y. (2020). Toward training recurrent neural networks for lifelong learning. Neural Computation, 32(1), 1–35. https://doi.org/10.1162/neco_a_01246.

    Article  MathSciNet  MATH  Google Scholar 

  39. Spooner, T., Fearnley, J., Savani, R., Koukorinis, A. (2018). Market making via reinforcement learning. In: 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, (pp. 434–442). International Foundation for Autonomous Agents and Multiagent Systems.

  40. Sutton, R.S., Barto, A.G. (2018). Reinforcement learning: An introduction, 2nd edn. Adaptive computation and machine learning series. MIT Press.

  41. Tanaka, F., & Yamamura, M. (2003). Multitask reinforcement learning on the distribution of MDPs. In: International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium, vol. 3, (pp. 1108–1113). IEEE. https://doi.org/10.1109/CIRA.2003.1222152.

  42. Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 1633–1685.

    MathSciNet  MATH  Google Scholar 

  43. Teh, Y., Bapst, V., Czarnecki, W.M., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N., & Pascanu, R. (2017). Distral: Robust multitask reinforcement learning. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30, (pp. 4496–4506). Curran Associates, Inc.

  44. Thrun, S. (1995). Is learning the N-th thing any easier than learning the first? In: 8th International Conference on Neural Information Processing Systems, NIPS’95, (pp. 640–646). MIT Press.

  45. Torrey, L., & Shavlik, J. (2010). Transfer learning. In: E.S. Olivas, J.D.M. Guerrero, M. Martinez-Sober, J.R. Magdalena-Benedito, A.J. Serrano López (eds.) In Handbook of research on machine learning applications and trends: Algorithms, methods, and techniques, (pp. 242–264). IGI Global. https://doi.org/10.4018/978-1-60566-766-9.ch011

  46. Tsymbal, A. (2004). The problem of concept drift: Definitions and related work.

  47. Watanabe, C., Hiramatsu, K., & Kashino, K. (2018). Modular representation of layered neural networks. Neural Networks, 97, 62–73. https://doi.org/10.1016/j.neunet.2017.09.017.

    Article  MATH  Google Scholar 

  48. Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101. https://doi.org/10.1007/BF00116900.

    Article  Google Scholar 

  49. Xu, J., & Zhu, Z. (2018). Reinforced continual learning. In: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (eds.) Advances in Neural Information Processing Systems 31, vol. 31, (pp. 899–908). Curran Associates, Inc.

  50. Yang, Q., Chen, Y., Xue, G. R., Dai, W., & Yu, Y. (2009). Heterogeneous transfer learning for image clustering via the socialweb. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (pp. 1-9).

  51. Jin, Yaochu, & Sendhoff, B. (2008). Pareto-based multiobjective machine learning: An overview and case studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(3), 397–415. https://doi.org/10.1109/TSMCC.2008.919172.

    Article  Google Scholar 

  52. Zenke, F., Poole, B., & Ganguli, S. (2017). Continual learning through synaptic intelligence. In: D. Precup, Y.W. Teh (eds.) 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, pp. 3987–3995. PMLR, International Convention Centre

  53. Žliobaitė, I., Pechenizkiy, M., & Gama, J. (2016). An overview of concept drift applications. In: N. Japkowicz, J. Stefanowski (eds.) Big data analysis: New algorithms for a new society, vol. 16, pp. 91–114. Springer International Publishing, Cham https://doi.org/10.1007/978-3-319-26989-4_4

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iago Bonnici.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bonnici, I., Gouaïch, A. & Michel, F. Input addition and deletion in reinforcement: towards protean learning. Auton Agent Multi-Agent Syst 36, 4 (2022). https://doi.org/10.1007/s10458-021-09534-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10458-021-09534-6

Keywords

Navigation