[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3583131.3590433acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

Published: 12 July 2023 Publication History

Abstract

In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.

Supplementary Material

PDF File (p1221-mace-suppl.pdf)
Supplemental material.

References

[1]
Alberto Alvarez, Steve Dahlskog, Jose Font, and Julian Togelius. 2019. Empowering quality diversity in dungeon design with interactive constrained MAP-Elites. In 2019 IEEE Conference on Games (CoG). IEEE, 1--8.
[2]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[3]
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
[4]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
[5]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part I 16. Springer, 213--229.
[6]
Leo Cazenille, Nicolas Bredeche, and Nathanael Aubert-Kato. 2019. Exploring Self-Assembling Behaviors in a Swarm of Bio-micro-robots using Surrogate-Assisted MAP-Elites. arXiv preprint arXiv:1910.00230 (2019).
[7]
Felix Chalumeau, Raphael Boige, Bryan Lim, Valentin Macé, Maxime Allard, Arthur Flajolet, Antoine Cully, and Thomas Pierrot. 2022. Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery. arXiv preprint arXiv:2210.03516 (2022).
[8]
Megan Charity, Ahmed Khalifa, and Julian Togelius. 2020. Baba is Y'all: Collaborative Mixed-Initiative Level Design. arXiv preprint arXiv:2003.14294 (2020).
[9]
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. 2021. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems 34 (2021), 15084--15097.
[10]
Cédric Colas, Vashisht Madhavan, Joost Huizinga, and Jeff Clune. 2020. Scaling MAP-Elites to deep neuroevolution. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 67--75.
[11]
Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. 2015. Robots that can adapt like animals. Nature 521, 7553 (2015), 503--507.
[12]
Antoine Cully and Yiannis Demiris. 2017. Quality and diversity optimization: A unifying modular framework. IEEE Transactions on Evolutionary Computation 22, 2 (2017), 245--259.
[13]
Antoine Cully and Yiannis Demiris. 2018. Hierarchical behavioral repertoires with unsupervised descriptors. In Proceedings of the Genetic and Evolutionary Computation Conference. 69--76.
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[15]
Linhao Dong, Shuang Xu, and Bo Xu. 2018. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5884--5888.
[16]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[17]
Sondre A Engebraaten, Jonas Moen, Oleg A Yakimenko, and Kyrre Glette. 2020. A framework for automatic behavior generation in multi-function swarms. Frontiers in Robotics and AI 7 (2020), 579403.
[18]
Manon Flageat, Felix Chalumeau, and Antoine Cully. 2022. Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains. ACM Transactions on Evolutionary Learning (2022).
[19]
Manon Flageat and Antoine Cully. 2020. Fast and stable MAP-Elites in noisy domains using deep grids. arXiv preprint arXiv:2006.14253 (2020).
[20]
Manon Flageat and Antoine Cully. 2023. Uncertain Quality-Diversity: Evaluation methodology and new methods for Quality-Diversity in Uncertain Domains. arXiv preprint arXiv:2302.00463 (2023).
[21]
Manon Flageat, Bryan Lim, Luca Grillotti, Maxime Allard, Simón C Smith, and Antoine Cully. 2022. Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning. arXiv preprint arXiv:2211.02193 (2022).
[22]
Matthew Fontaine and Stefanos Nikolaidis. 2020. A quality diversity approach to automatically generating human-robot interaction scenarios in shared autonomy. arXiv preprint arXiv:2012.04283 (2020).
[23]
Matthew C. Fontaine, Scott Lee, L. B. Soros, Fernando De Mesentier Silva, Julian Togelius, and Amy K. Hoover. 2019. Mapping Hearthstone Deck Spaces with Map-Elites with Sliding Boundaries. In Proceedings of The Genetic and Evolutionary Computation Conference. ACM.
[24]
C Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. 2021. Brax-A Differentiable Physics Engine for Large Scale Rigid Body Simulation. arXiv preprint arXiv:2106.13281 (2021).
[25]
Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018).
[26]
Michael Janner, Qiyang Li, and Sergey Levine. 2021. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems 34 (2021), 1273--1286.
[27]
Marija Jegorova, Stéphane Doncieux, and Timothy M Hospedales. 2020. Behavioral repertoire via generative adversarial policy networks. IEEE Transactions on Cognitive and Developmental Systems (2020).
[28]
Niels Justesen, Sebastian Risi, and Jean-Baptiste Mouret. 2019. Map-elites for noisy domains by adaptive sampling. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 121--122.
[29]
Kuang-Huei Lee, Ofir Nachum, Sherry Yang, Lisa Lee, C. Daniel Freeman, Sergio Guadarrama, Ian Fischer, Winnie Xu, Eric Jang, Henryk Michalewski, and Igor Mordatch. 2022. Multi-Game Decision Transformers. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=0gouO5saq6K
[30]
Bryan Lim, Maxime Allard, Luca Grillotti, and Antoine Cully. 2022. Accelerated Quality-Diversity for Robotics through Massive Parallelism. arXiv preprint arXiv:2202.01258 (2022).
[31]
Douglas Morrison, Peter Corke, and Jurgen Leitner. 2020. EGAD! an Evolved Grasping Analysis Dataset for diversity and reproducibility in robotic manipulation. IEEE Robotics and Automation Letters (2020).
[32]
Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015).
[33]
Olle Nilsson and Antoine Cully. 2021. Policy gradient assisted map-elites. In Proceedings of the Genetic and Evolutionary Computation Conference. 866--875.
[34]
Olle Nilsson and Antoine Cully. 2021. Policy Gradient Assisted MAP-Elites; Policy Gradient Assisted MAP-Elites. (2021).
[35]
Thomas Pierrot, Valentin Macé, Felix Chalumeau, Arthur Flajolet, Geoffrey Cideron, Karim Beguir, Antoine Cully, Olivier Sigaud, and Nicolas Perrin-Gilbert. 2022. Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization. In GECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference.
[36]
Thomas Pierrot, Guillaume Richard, Karim Beguir, and Antoine Cully. 2022. Multi-Objective Quality Diversity Optimization. arXiv preprint arXiv:2202.03057 (2022).
[37]
Justin K Pugh, Lisa B Soros, and Kenneth O. Stanley. 2016. Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI 3 (2016), 40.
[38]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).
[39]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
[40]
Nemanja Rakicevic, Antoine Cully, and Petar Kormushev. 2021. Policy manifold search: Exploring the manifold hypothesis for diversity-based neuroevolution. In Proceedings of the Genetic and Evolutionary Computation Conference. 901--909.
[41]
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, et al. 2022. A generalist agent. arXiv preprint arXiv:2205.06175 (2022).
[42]
Vassilis Vassiliades, Konstantinos I. Chatzilygeroudis, and Jean-Baptiste Mouret. 2016. Scaling Up MAP-Elites Using Centroidal Voronoi Tessellations. CoRR abs/1610.05729 (2016). arXiv:1610.05729 http://arxiv.org/abs/1610.05729
[43]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[44]
Tianping Zhang, Yuanqi Li, Yifei Jin, and Jian Li. 2020. AutoAlpha: an Efficient Hierarchical Evolutionary Algorithm for Mining Alpha Factors in Quantitative Investment. arXiv preprint arXiv:2002.08245 (2020).

Cited By

View all
  • (2024)Evolutionary Seeding of Diverse Structural Design Solutions via Topology OptimizationACM Transactions on Evolutionary Learning and Optimization10.1145/3670693Online publication date: 5-Jun-2024
  • (2024)Quality Diversity for Robot Learning: Limitations and Future DirectionsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654431(587-590)Online publication date: 14-Jul-2024
  • (2024)Towards Multi-Morphology Controllers with Diversity and Knowledge DistillationProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654013(367-376)Online publication date: 14-Jul-2024
  • Show More Cited By

Index Terms

  1. The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2023
    1667 pages
    ISBN:9798400701191
    DOI:10.1145/3583131
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. neuroevolution
    2. quality-diversity
    3. decision transformer

    Qualifiers

    • Research-article

    Conference

    GECCO '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)77
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Evolutionary Seeding of Diverse Structural Design Solutions via Topology OptimizationACM Transactions on Evolutionary Learning and Optimization10.1145/3670693Online publication date: 5-Jun-2024
    • (2024)Quality Diversity for Robot Learning: Limitations and Future DirectionsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654431(587-590)Online publication date: 14-Jul-2024
    • (2024)Towards Multi-Morphology Controllers with Diversity and Knowledge DistillationProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654013(367-376)Online publication date: 14-Jul-2024
    • (2024)No-brainer: Morphological Computation Driven Adaptive Behavior in Soft RobotsFrom Animals to Animats 1710.1007/978-3-031-71533-4_6(81-92)Online publication date: 7-Sep-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media