Abstract
Reinforcement Learning (RL) agents are commonly thought of as adaptive decision procedures. They work on input/output data streams called “states”, “actions” and “rewards”. Most current research about RL adaptiveness to changes works under the assumption that the streams signatures (i.e. arity and types of inputs and outputs) remain the same throughout the agent lifetime. As a consequence, natural situations where the signatures vary (e.g. when new data streams become available, or when others become obsolete) are not studied. In this paper, we relax this assumption and consider that signature changes define a new learning situation called Protean Learning (PL). When they occur, traditional RL agents become undefined, so they need to restart learning. Can better methods be developed under the PL view? To investigate this, we first construct a stream-oriented formalism to properly define PL and signature changes. Then, we run experiments in an idealized PL situation where input addition and deletion occur during the learning process. Results show that a simple PL-oriented method enables graceful adaptation of these arity changes, and is more efficient than restarting the process.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abramowitz, M., & Stegun, I.A. (1972). Lengendre functions. In: Handbook of mathematical functions: With formulas, graphs, and mathematical Tables, 9th edn., (pp. 771–802).
Bonnici, I. (2021). Towards protean learning: Accommodating signature changes in artificial agents. Ph.D Thesis, Université de Montpellier
Bonnici, I., Gouaïch, A., & Michel, F. (2019). Effects of input addition in learning for adaptive games: Towards learning with structural changes. In: EvoApplications: Applications of evolutionary computation, vol. LNCS, (pp. 172–184). https://doi.org/10.1007/978-3-030-16692-2_12
Busto, P.P., & Gall, J. (2017). Open set domain adaptation. In:International Conference on Computer Vision (ICCV), (pp. 754–763). IEEE https://doi.org/10.1109/ICCV.2017.88
Caruana, R. (1994). Learning many related tasks at the same time with backpropagation. In: 7th International Conference on Neural Information Processing Systems. NIPS’94, (pp. 657–664). MIT Press.
Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y., & Bahdanau, D. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Conference on Empirical methods in natural language processing (EMNLP), (pp. 1724–1734). Association for Computational Linguistics, Doha, Qatar. https://doi.org/10.3115/v1/D14-1179
Cui, Y., Ahmad, S., & Hawkins, J. (2016). Continuous online sequence learning with an unsupervised neural network model. Neural Computation, 28(11), 2474–2504. https://doi.org/10.1162/NECO_a_00893.
De Rosario-Martinez, H. (2015). Phia: Post-hoc interaction analysis.
Devin, C., Gupta, A., Darrell, T., Abbeel, P., & Levine, S. (2017). Learning modular neural network policies for multi-task and multi-robot transfer. In: International Conference on Robotics and Automation (ICRA), (pp. 2169–2176). IEEEhttps://doi.org/10.1109/ICRA.2017.7989250
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. https://doi.org/10.1016/0364-0213(90)90002-E.
Frans, K., Ho, J., Chen, X., Abbeel, P., & Schulman, J. (2018). Meta learning shared hierarchies. arXiv arXiv:abs/1710.09767.
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 1–37. https://doi.org/10.1145/2523813.
Ge, L., Gao, J., Ngo, H., Li, K., & Zhang, A. (2014). On handling negative transfer and imbalanced distributions in multiple source transfer learning: Multiple Source Transfer Learning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(4), 254–271. https://doi.org/10.1002/sam.11217.
Gu, S., Lillicrap, T., Sutskever, I., & Levine, S. (2016). Continuous deep Q-Learning with model-based acceleration. In: M.F. Balcan, K.Q. Weinberger (eds.) 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 48, pp. 2829–2838. PMLR, New York, New York, USA.
Gupta, A. K., Smith, K. G., & Shalley, C. E. (2006). The interplay between exploration and exploitation. Academy of Management Journal, 49(4), 693–706. https://doi.org/10.5465/amj.2006.22083026.
Hanna, C.J., Hickey, R.J., Charles, D.K., & Black, M.M. (2010). Modular reinforcement learning architectures for artificially intelligent agents in complex game environments. In: Symposium on Computational Intelligence and Games (CIG), pp. 380–387. IEEE, Copenhagen, Denmark. https://doi.org/10.1109/ITW.2010.5593329
Harel, M., & Mannor, S. (2011). Learning from multiple outlooks. In: 28th International Conference on International Conference on Machine Learning, ICML’11, (pp. 401–408). Omnipress.
Heng, Wang., & Abraham, Z. (2015). Concept drift detection for streaming data. In: 2015 International Joint Conference on Neural Networks (IJCNN), (pp. 1–9). IEEE, Killarney, Ireland. https://doi.org/10.1109/IJCNN.2015.7280398
Hinton, G. E., & Sejnowski, T. J. (Eds.). (1999). Unsupervised learning: Foundations of neural computation. Computational Neuroscience. MIT Press.
Jones, M. C., Marron, J. S., & Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91(433), 401–407. https://doi.org/10.1080/01621459.1996.10476701.
Kaplanis, C., Shanahan, M., & Clopath, C. (2018). Continual reinforcement learning with complex synapses. In: J. Dy, A. Krause (eds.) 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, (pp. 2497–2506). PMLR.
Khanıev, T. A., Unver, İ, & Maden, S. (2001). On the semi-Markovian random walk with two reflecting barriers. Stochastic Analysis and Applications, 19(5), 799–819. https://doi.org/10.1081/SAP-120000222.
Kingma, D.P., Ba, J. (2015). Adam: A method for stochastic optimization. In: Y. Bengio, Y. LeCun (eds.) 3rd International Conference on Learning Representations (ICLR).
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526. https://doi.org/10.1073/pnas.1611835114.
Lazaric, A. (2012). Transfer in reinforcement learning: A framework and a survey. In: M. Wiering, M. van Otterlo (eds.) Reinforcement Learning, vol. 12, (pp. 143–173). Springer.
Li, W., Duan, L., Xu, D., & Tsang, I. W. (2014). Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1134–1148.
Lillicrap, T.P., Hunt, J.J., er Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv e-prints p. arXiv:1509.02971
Losing, V., Hammer, B., & Wersing, H. (2018). Incremental on-line learning: A review and comparison of state of the art algorithms. Neurocomputing, 275, 1261–1274. https://doi.org/10.1016/j.neucom.2017.06.084.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning. Adaptive Computation and Machine Learning Series. MIT Press.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in PyTorch.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Ring, M.B. (1994). Continual learning in reinforcement environments. Ph.D Thesis, University of Texas at Austin.
Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., & Hadsell, R. (2016). Progressive neural networks. CoRR arXiv:abs/1606.04671
Saad, D. (1998). (ed.): On-line learning in neural networks. Publications of the Newton Institute. Cambridge University Press.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. https://doi.org/10.1016/j.neunet.2014.09.003.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–503.
Sodhani, S., Chandar, S., & Bengio, Y. (2020). Toward training recurrent neural networks for lifelong learning. Neural Computation, 32(1), 1–35. https://doi.org/10.1162/neco_a_01246.
Spooner, T., Fearnley, J., Savani, R., Koukorinis, A. (2018). Market making via reinforcement learning. In: 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, (pp. 434–442). International Foundation for Autonomous Agents and Multiagent Systems.
Sutton, R.S., Barto, A.G. (2018). Reinforcement learning: An introduction, 2nd edn. Adaptive computation and machine learning series. MIT Press.
Tanaka, F., & Yamamura, M. (2003). Multitask reinforcement learning on the distribution of MDPs. In: International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium, vol. 3, (pp. 1108–1113). IEEE. https://doi.org/10.1109/CIRA.2003.1222152.
Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 1633–1685.
Teh, Y., Bapst, V., Czarnecki, W.M., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N., & Pascanu, R. (2017). Distral: Robust multitask reinforcement learning. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30, (pp. 4496–4506). Curran Associates, Inc.
Thrun, S. (1995). Is learning the N-th thing any easier than learning the first? In: 8th International Conference on Neural Information Processing Systems, NIPS’95, (pp. 640–646). MIT Press.
Torrey, L., & Shavlik, J. (2010). Transfer learning. In: E.S. Olivas, J.D.M. Guerrero, M. Martinez-Sober, J.R. Magdalena-Benedito, A.J. Serrano López (eds.) In Handbook of research on machine learning applications and trends: Algorithms, methods, and techniques, (pp. 242–264). IGI Global. https://doi.org/10.4018/978-1-60566-766-9.ch011
Tsymbal, A. (2004). The problem of concept drift: Definitions and related work.
Watanabe, C., Hiramatsu, K., & Kashino, K. (2018). Modular representation of layered neural networks. Neural Networks, 97, 62–73. https://doi.org/10.1016/j.neunet.2017.09.017.
Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101. https://doi.org/10.1007/BF00116900.
Xu, J., & Zhu, Z. (2018). Reinforced continual learning. In: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (eds.) Advances in Neural Information Processing Systems 31, vol. 31, (pp. 899–908). Curran Associates, Inc.
Yang, Q., Chen, Y., Xue, G. R., Dai, W., & Yu, Y. (2009). Heterogeneous transfer learning for image clustering via the socialweb. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (pp. 1-9).
Jin, Yaochu, & Sendhoff, B. (2008). Pareto-based multiobjective machine learning: An overview and case studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(3), 397–415. https://doi.org/10.1109/TSMCC.2008.919172.
Zenke, F., Poole, B., & Ganguli, S. (2017). Continual learning through synaptic intelligence. In: D. Precup, Y.W. Teh (eds.) 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, pp. 3987–3995. PMLR, International Convention Centre
Žliobaitė, I., Pechenizkiy, M., & Gama, J. (2016). An overview of concept drift applications. In: N. Japkowicz, J. Stefanowski (eds.) Big data analysis: New algorithms for a new society, vol. 16, pp. 91–114. Springer International Publishing, Cham https://doi.org/10.1007/978-3-319-26989-4_4
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bonnici, I., Gouaïch, A. & Michel, F. Input addition and deletion in reinforcement: towards protean learning. Auton Agent Multi-Agent Syst 36, 4 (2022). https://doi.org/10.1007/s10458-021-09534-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10458-021-09534-6