Input addition and deletion in reinforcement: towards protean learning

560 Accesses
1 Citation
Explore all metrics

Abstract

Reinforcement Learning (RL) agents are commonly thought of as adaptive decision procedures. They work on input/output data streams called “states”, “actions” and “rewards”. Most current research about RL adaptiveness to changes works under the assumption that the streams signatures (i.e. arity and types of inputs and outputs) remain the same throughout the agent lifetime. As a consequence, natural situations where the signatures vary (e.g. when new data streams become available, or when others become obsolete) are not studied. In this paper, we relax this assumption and consider that signature changes define a new learning situation called Protean Learning (PL). When they occur, traditional RL agents become undefined, so they need to restart learning. Can better methods be developed under the PL view? To investigate this, we first construct a stream-oriented formalism to properly define PL and signature changes. Then, we run experiments in an idealized PL situation where input addition and deletion occur during the learning process. Results show that a simple PL-oriented method enables graceful adaptation of these arity changes, and is more efficient than restarting the process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Learning in the Presence of Multiple Agents

Reinforcement Learning from Clip

Data-Efficient Offline Reinforcement Learning with Approximate Symmetries

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abramowitz, M., & Stegun, I.A. (1972). Lengendre functions. In: Handbook of mathematical functions: With formulas, graphs, and mathematical Tables, 9th edn., (pp. 771–802).
Bonnici, I. (2021). Towards protean learning: Accommodating signature changes in artificial agents. Ph.D Thesis, Université de Montpellier
Bonnici, I., Gouaïch, A., & Michel, F. (2019). Effects of input addition in learning for adaptive games: Towards learning with structural changes. In: EvoApplications: Applications of evolutionary computation, vol. LNCS, (pp. 172–184). https://doi.org/10.1007/978-3-030-16692-2_12
Busto, P.P., & Gall, J. (2017). Open set domain adaptation. In:International Conference on Computer Vision (ICCV), (pp. 754–763). IEEE https://doi.org/10.1109/ICCV.2017.88
Caruana, R. (1994). Learning many related tasks at the same time with backpropagation. In: 7th International Conference on Neural Information Processing Systems. NIPS’94, (pp. 657–664). MIT Press.
Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y., & Bahdanau, D. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Conference on Empirical methods in natural language processing (EMNLP), (pp. 1724–1734). Association for Computational Linguistics, Doha, Qatar. https://doi.org/10.3115/v1/D14-1179
Cui, Y., Ahmad, S., & Hawkins, J. (2016). Continuous online sequence learning with an unsupervised neural network model. Neural Computation, 28(11), 2474–2504. https://doi.org/10.1162/NECO_a_00893.
Article MathSciNet MATH Google Scholar
De Rosario-Martinez, H. (2015). Phia: Post-hoc interaction analysis.
Devin, C., Gupta, A., Darrell, T., Abbeel, P., & Levine, S. (2017). Learning modular neural network policies for multi-task and multi-robot transfer. In: International Conference on Robotics and Automation (ICRA), (pp. 2169–2176). IEEEhttps://doi.org/10.1109/ICRA.2017.7989250
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. https://doi.org/10.1016/0364-0213(90)90002-E.
Article Google Scholar
Frans, K., Ho, J., Chen, X., Abbeel, P., & Schulman, J. (2018). Meta learning shared hierarchies. arXiv arXiv:abs/1710.09767.
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 1–37. https://doi.org/10.1145/2523813.
Article MATH Google Scholar
Ge, L., Gao, J., Ngo, H., Li, K., & Zhang, A. (2014). On handling negative transfer and imbalanced distributions in multiple source transfer learning: Multiple Source Transfer Learning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(4), 254–271. https://doi.org/10.1002/sam.11217.
Article MATH Google Scholar
Gu, S., Lillicrap, T., Sutskever, I., & Levine, S. (2016). Continuous deep Q-Learning with model-based acceleration. In: M.F. Balcan, K.Q. Weinberger (eds.) 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 48, pp. 2829–2838. PMLR, New York, New York, USA.
Gupta, A. K., Smith, K. G., & Shalley, C. E. (2006). The interplay between exploration and exploitation. Academy of Management Journal, 49(4), 693–706. https://doi.org/10.5465/amj.2006.22083026.
Article Google Scholar
Hanna, C.J., Hickey, R.J., Charles, D.K., & Black, M.M. (2010). Modular reinforcement learning architectures for artificially intelligent agents in complex game environments. In: Symposium on Computational Intelligence and Games (CIG), pp. 380–387. IEEE, Copenhagen, Denmark. https://doi.org/10.1109/ITW.2010.5593329
Harel, M., & Mannor, S. (2011). Learning from multiple outlooks. In: 28th International Conference on International Conference on Machine Learning, ICML’11, (pp. 401–408). Omnipress.
Heng, Wang., & Abraham, Z. (2015). Concept drift detection for streaming data. In: 2015 International Joint Conference on Neural Networks (IJCNN), (pp. 1–9). IEEE, Killarney, Ireland. https://doi.org/10.1109/IJCNN.2015.7280398
Hinton, G. E., & Sejnowski, T. J. (Eds.). (1999). Unsupervised learning: Foundations of neural computation. Computational Neuroscience. MIT Press.
Google Scholar
Jones, M. C., Marron, J. S., & Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91(433), 401–407. https://doi.org/10.1080/01621459.1996.10476701.
Article MathSciNet MATH Google Scholar
Kaplanis, C., Shanahan, M., & Clopath, C. (2018). Continual reinforcement learning with complex synapses. In: J. Dy, A. Krause (eds.) 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, (pp. 2497–2506). PMLR.
Khanıev, T. A., Unver, İ, & Maden, S. (2001). On the semi-Markovian random walk with two reflecting barriers. Stochastic Analysis and Applications, 19(5), 799–819. https://doi.org/10.1081/SAP-120000222.
Article MathSciNet MATH Google Scholar
Kingma, D.P., Ba, J. (2015). Adam: A method for stochastic optimization. In: Y. Bengio, Y. LeCun (eds.) 3rd International Conference on Learning Representations (ICLR).
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526. https://doi.org/10.1073/pnas.1611835114.
Article MathSciNet MATH Google Scholar
Lazaric, A. (2012). Transfer in reinforcement learning: A framework and a survey. In: M. Wiering, M. van Otterlo (eds.) Reinforcement Learning, vol. 12, (pp. 143–173). Springer.
Li, W., Duan, L., Xu, D., & Tsang, I. W. (2014). Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1134–1148.
Article Google Scholar
Lillicrap, T.P., Hunt, J.J., er Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv e-prints p. arXiv:1509.02971
Losing, V., Hammer, B., & Wersing, H. (2018). Incremental on-line learning: A review and comparison of state of the art algorithms. Neurocomputing, 275, 1261–1274. https://doi.org/10.1016/j.neucom.2017.06.084.
Article Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236.
Article Google Scholar
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning. Adaptive Computation and Machine Learning Series. MIT Press.
MATH Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in PyTorch.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Ring, M.B. (1994). Continual learning in reinforcement environments. Ph.D Thesis, University of Texas at Austin.
Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., & Hadsell, R. (2016). Progressive neural networks. CoRR arXiv:abs/1606.04671
Saad, D. (1998). (ed.): On-line learning in neural networks. Publications of the Newton Institute. Cambridge University Press.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. https://doi.org/10.1016/j.neunet.2014.09.003.
Article Google Scholar
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–503.
Article Google Scholar
Sodhani, S., Chandar, S., & Bengio, Y. (2020). Toward training recurrent neural networks for lifelong learning. Neural Computation, 32(1), 1–35. https://doi.org/10.1162/neco_a_01246.
Article MathSciNet MATH Google Scholar
Spooner, T., Fearnley, J., Savani, R., Koukorinis, A. (2018). Market making via reinforcement learning. In: 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, (pp. 434–442). International Foundation for Autonomous Agents and Multiagent Systems.
Sutton, R.S., Barto, A.G. (2018). Reinforcement learning: An introduction, 2nd edn. Adaptive computation and machine learning series. MIT Press.
Tanaka, F., & Yamamura, M. (2003). Multitask reinforcement learning on the distribution of MDPs. In: International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium, vol. 3, (pp. 1108–1113). IEEE. https://doi.org/10.1109/CIRA.2003.1222152.
Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 1633–1685.
MathSciNet MATH Google Scholar
Teh, Y., Bapst, V., Czarnecki, W.M., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N., & Pascanu, R. (2017). Distral: Robust multitask reinforcement learning. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30, (pp. 4496–4506). Curran Associates, Inc.
Thrun, S. (1995). Is learning the N-th thing any easier than learning the first? In: 8th International Conference on Neural Information Processing Systems, NIPS’95, (pp. 640–646). MIT Press.
Torrey, L., & Shavlik, J. (2010). Transfer learning. In: E.S. Olivas, J.D.M. Guerrero, M. Martinez-Sober, J.R. Magdalena-Benedito, A.J. Serrano López (eds.) In Handbook of research on machine learning applications and trends: Algorithms, methods, and techniques, (pp. 242–264). IGI Global. https://doi.org/10.4018/978-1-60566-766-9.ch011
Tsymbal, A. (2004). The problem of concept drift: Definitions and related work.
Watanabe, C., Hiramatsu, K., & Kashino, K. (2018). Modular representation of layered neural networks. Neural Networks, 97, 62–73. https://doi.org/10.1016/j.neunet.2017.09.017.
Article MATH Google Scholar
Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101. https://doi.org/10.1007/BF00116900.
Article Google Scholar
Xu, J., & Zhu, Z. (2018). Reinforced continual learning. In: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (eds.) Advances in Neural Information Processing Systems 31, vol. 31, (pp. 899–908). Curran Associates, Inc.
Yang, Q., Chen, Y., Xue, G. R., Dai, W., & Yu, Y. (2009). Heterogeneous transfer learning for image clustering via the socialweb. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (pp. 1-9).
Jin, Yaochu, & Sendhoff, B. (2008). Pareto-based multiobjective machine learning: An overview and case studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(3), 397–415. https://doi.org/10.1109/TSMCC.2008.919172.
Article Google Scholar
Zenke, F., Poole, B., & Ganguli, S. (2017). Continual learning through synaptic intelligence. In: D. Precup, Y.W. Teh (eds.) 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, pp. 3987–3995. PMLR, International Convention Centre
Žliobaitė, I., Pechenizkiy, M., & Gama, J. (2016). An overview of concept drift applications. In: N. Japkowicz, J. Stefanowski (eds.) Big data analysis: New algorithms for a new society, vol. 16, pp. 91–114. Springer International Publishing, Cham https://doi.org/10.1007/978-3-319-26989-4_4

Download references

Author information

Authors and Affiliations

LIRMM, Univ Montpellier, CNRS, 34095, Montpellier, Cedex 5, France
Iago Bonnici, Abdelkader Gouaïch & Fabien Michel

Authors

Iago Bonnici
View author publications
You can also search for this author in PubMed Google Scholar
Abdelkader Gouaïch
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Michel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iago Bonnici.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonnici, I., Gouaïch, A. & Michel, F. Input addition and deletion in reinforcement: towards protean learning. Auton Agent Multi-Agent Syst 36, 4 (2022). https://doi.org/10.1007/s10458-021-09534-6

Download citation

Accepted: 15 September 2021
Published: 09 November 2021
DOI: https://doi.org/10.1007/s10458-021-09534-6

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning in the Presence of Multiple Agents

Reinforcement Learning from Clip

Data-Efficient Offline Reinforcement Learning with Approximate Symmetries

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Input addition and deletion in reinforcement: towards protean learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning in the Presence of Multiple Agents

Reinforcement Learning from Clip

Data-Efficient Offline Reinforcement Learning with Approximate Symmetries

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation