[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3377929.3398126acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Learning to walk - reward relevance within an enhanced neuroevolution approach

Published: 08 July 2020 Publication History

Abstract

Recent advances in human motion sensing technologies and machine learning have enhanced the potential of AI to simulate artificial agents exhibiting human-like movements. Human movements are typically explored via experimental recordings with the aim of establishing relationships between neural and mechanical activities. A recent trend in AI research shows that AI algorithms work remarkably well when combined with sufficient computing resources and data. One common criticism is that all of these methods are gradient-based, which involves a gradient approximation, thus suffering of the well-known vanishing gradient problem.
In this paper, the goal is to build an ANN-based controller that enables an agent to walk in a human-like way. In particular, the proposed methodology is based on a new approach to Neuroevolution based on NeuroEvolution of Augmenting Topologies (NEAT). The original algorithm has been endowed with a different type of selection-reproduction mechanism and a modified management of the population, with the aim to improve the performance and to reduce the computational effort. Experiments have evidenced the effectiveness of the proposed approach and have highlighted the interdependence among three key aspects: the reward framework, the Evolutionary Algorithm chosen and the hyper-parameters' configuration. As a consequence, none of the above aspects can be ignored and their balancing is crucial for achieving suitable results and good performance.

References

[1]
F. Amato, A. Lopez, M.E. Pena-Mendez, P. Vanhara, A. Hampl, and J. Havel. 2013. Artificial neural networks in medical diagnosis. Journal of Applied Biomedicine 11, 2 (2013), 47--58.
[2]
L. Arnold, S. Rebecchi, S. Chevallier, and H. Paugam-Moisy. 2011. An Introduction to Deep Learning. Proceedings of the European Symposium of Artificial Neural Network Vol. 1, 477--488.
[3]
T Blickle and L Thiele. 1996. A Comparison of Selection Schemes Used in Evolutionary Algorithms. Evolutionary Computation 4, 4 (1996), 361--394.
[4]
C.B. Casey and Kris H. 2013. Artificial intelligence framework for simulating clinical decision-making: A Markov decision process approach. Artificial Intelligence in Medicine 57, 1 (2013), 9 -- 19.
[5]
I. Colucci, G. Pellegrino, A. Della Cioppa, and A. Marcelli. [n.d.]. Independent Leg reward on MuJoCo's Walker2D environment. https://www.youtube.com/watch?v=sFZeF2rfDd4
[6]
S. Debnath, A. Devos, E. Heiden, R. Julian, and F. Khatana. [n.d.]. Humanoid Imitation Learning from Diverse Sources. https://www.endtoend.ai/envs/gym/mujoco/walker2d/
[7]
N. Di Palo. [n.d.]. Learning to walk with evolutionary algorithms applied to a bio-mechanical model. https://towardsdatascience.com/learning-to-walk-with-evolutionary-algorithms-applied-to-a-bio-mechanical-model-1ccc094537ce
[8]
S.E. Dilsizian and E.L. Siegel. 2013. Artificial Intelligence in Medicine and Cardiac Imaging: Harnessing Big Data and Advanced Computing to Provide Personalized Medical Diagnosis and Treatment. Current Cardiology Reports 16, 1 (13 Dec 2013), 441.
[9]
Nicolas Heess et al. 2017. Emergence of Locomotion Behaviours in Rich Environments. CoRR (2017). arXiv:1707.02286
[10]
Holly Grimm. 2019. Imitiation Learning and Mujoco. https://hollygrimm.com/rl_bc
[11]
J. Ho and S. Ermon. 2016. Generative Adversarial Imitation Learning. arXiv:1606.03476
[12]
Sepp Hochreiter. 1991. Untersuchungen zu dynamischen neuronalen Netzen. (1991).
[13]
Watkins C. J. and Dayan P. 1992. Technical Note: Q-learning. Machine Learning 8 (1992), 279--292.
[14]
A. Karniel. 2011. Open questions in computational motor control. Journal of integrative neuroscience 10, 3 (2011), 385--411.
[15]
L. et al. Kidzinski. 2018. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments. CoRR (2018). arXiv:1804.00361
[16]
A. Krizhevsky, I. Sutskever, and G.E Hinton. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 6, 6 (2017), 84--90.
[17]
Stanford Neuromuscular Biomechanics Laboratory. [n.d.]. http://osim-rl.stanford.edu/docs/models/#arm2denv
[18]
Stanford Neuromuscular Biomechanics Laboratory. [n.d.]. NeurIPS 2019: Learn to Move - Walk Around. https://github.com/stanfordnmbl/osim-rl
[19]
Yuxi Li. 2017. Deep Reinforcement Learning: An Overview. arXiv:cs.LG/1701.07274
[20]
T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv:cs.LG/1509.02971
[21]
S.A. McLeod. 2018. Skinner - Operant Conditioning. Simply Psychology (2018).
[22]
L.C. Melo, M.R.O.A. Maximo, and A.M. Cunha. 2019. Learning Humanoid Robot Motions Through Deep Neural Networks. arXiv:1901.00270
[23]
V. Mnih, K. Kavukcuoglu, D. Silver, Graves. A., I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602
[24]
OpenAI. [n.d.]. Deep Deterministic Policy Gradient. https://spinningup.openai.com/en/latest/algorithms/ddpg.html
[25]
OpenAI. [n.d.]. A python library that is a collection of test problems. https://gym.openai.com/envs/#classic_control
[26]
OpenAI. [n.d.]. A set of continuous control tasks, running in a fast physics simulator. https://gym.openai.com/envs/Walker2d-v2/
[27]
A. N. Ramesh, C. Kambhampati, J. R. T. Monson, and P. J. Drew. 2004. Artificial intelligence in medicine. Annals of the Royal College of Surgeons of England 86, 5 (Sep 2004), 334--338.
[28]
T. Salimans, J. Ho, X. Chen, Sidor. S., and I. Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864
[29]
T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
[30]
M. Sanjeevi. [n.d.]. Ch 12:Reinforcement learning Complete Guide towardsAGI. https://medium.com/deep-math-machine-learning-ai/ch-12-reinforcement-learning-complete-guide-towardsagi-ceea325c5d53
[31]
J. Schulman, Wolski. F., P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347
[32]
J. Schulman, S. Levine, P. Moritz, M.J. Jordan, and P. Abbeel. 2015. Trust Region Policy Optimization. arXiv:1502.05477
[33]
F. Sehnke, C. Osendorfer, T. Rückstieß, A. Graves, Peters J., and Schmidhuber J. 2010. Parameter-exploring policy gradients. Neural Networks 23, 4 (2010), 551--559.
[34]
F. Seide, G. Li, and D. Yu. 2011. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks. In Interspeech 2011 (interspeech 2011 ed.). International Speech Communication Association.
[35]
R.L. Seungjae. [n.d.]. MuJoCo Walker2D Environment. https://www.endtoend.ai/envs/gym/mujoco/walker2d/
[36]
D. Silver, G. Lever, N.M.O. Heess, T. Degris, Wierstra. D., and M.A. Riedmiller. 2014. Deterministic Policy Gradient Algorithms. In ICML.
[37]
S. Song and H. Geyer. 2015. A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion. The Journal of physiology 593, 16 (2015), 3493--3511.
[38]
Kenneth O. Stanley. [n.d.]. NEAT-Python. https://neat-python.readthedocs.io/en/latest/

Index Terms

  1. Learning to walk - reward relevance within an enhanced neuroevolution approach
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
        July 2020
        1982 pages
        ISBN:9781450371278
        DOI:10.1145/3377929
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 July 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article

        Conference

        GECCO '20
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 91
          Total Downloads
        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 11 Dec 2024

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media