[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Actor-Critic with Variable Time Discretization via Sustained Actions

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Abstract

Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 55.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Defined as the number of failures before the first success.

  2. 2.

    https://github.com/lychanl/acer-release/releases/tag/SusACER.

References

  1. Baird, L.: Reinforcement learning in continuous time: advantage updating. In: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN 1994). vol. 4, pp. 2448–2453 (1994). https://doi.org/10.1109/ICNN.1994.374604

  2. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can learn difficult learning control problems. IEEE Trans. Syst. Man Cybern. B 13, 834–846 (1983)

    Article  Google Scholar 

  3. Biedenkapp, A., Rajan, R., Hutter, F., Lindauer, M.: Temporl: learning when to act. CoRR abs/2106.05262 (2021). https://arxiv.org/abs/2106.05262

  4. Coumans, E., Bai, Y.: Pybullet, a python module for physics simulation for games, robotics and machine learning (2016–2021). https://pybullet.org

  5. Dabney, W., Ostrovski, G., Barreto, A.: Temporally-extended (\(\epsilon \))-greedy exploration. CoRR abs/2006.01782 (2020). https://arxiv.org/abs/2006.01782

  6. Dulac-Arnold, G., et al.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  7. ElDahshan, K.A., Farouk, H., Mofreh, E.: Deep reinforcement learning based video games: a review. In: 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), pp. 302–309 (2022). https://doi.org/10.1109/MIUCC55081.2022.9781752

  8. Gürtler, N., Büchler, D., Martius, G.: Hierarchical reinforcement learning with timed subgoals (2021)

    Google Scholar 

  9. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor (2018). arXiv:1801.01290

  10. Kalyanakrishnan, S., et al.: An analysis of frame-skipping in reinforcement learning (2021)

    Google Scholar 

  11. Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value function. In: ICML (1998)

    Google Scholar 

  12. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). https://arxiv.org/abs/1412.6980

  13. Lakshminarayanan, A., Sharma, S., Ravindran, B.: Dynamic action repetition for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017). https://doi.org/10.1609/aaai.v31i1.10918. https://ojs.aaai.org/index.php/AAAI/article/view/10918

  14. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2016), arXiv:1509.02971

  15. Liu, R., Nageotte, F., Zanne, P., de Mathelin, M., Dresp-Langley, B.: Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review. Robotics 10(1) (2021). https://doi.org/10.3390/robotics10010022. https://www.mdpi.com/2218-6581/10/1/22

  16. Mann, T.A., Mannor, S., Precup, D.: Approximate value iteration with temporally extended actions. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI 2017, pp. 5035–5039. AAAI Press (2017)

    Google Scholar 

  17. Metelli, A.M., Mazzolini, F., Bisi, L., Sabbioni, L., Restelli, M.: Control frequency adaptation via action persistence in batch reinforcement learning. CoRR abs/2002.06836 (2020). https://arxiv.org/abs/2002.06836

  18. Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). arXiv:1312.5602

  19. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 522–533 (2015)

    Article  Google Scholar 

  20. Akkaya, I., et al.: Solving Rubik’s cube with a robot hand (2019)

    Google Scholar 

  21. Park, S., Kim, J., Kim, G.: Time discretization-invariant safe action repetition for policy gradient methods. CoRR abs/2111.03941 (2021). https://arxiv.org/abs/2111.03941

  22. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv:1707.06347

  23. Sharma, S., Srinivas, A., Ravindran, B.: Learning to repeat: fine grained action repetition for deep reinforcement learning (2020)

    Google Scholar 

  24. Singh, B., Kumar, R., Singh, V.P.: Reinforcement learning in robotic applications: a comprehensive survey. Artif. Intell. Rev. 55(2), 945–990 (2022). https://doi.org/10.1007/s10462-021-09997-9

    Article  Google Scholar 

  25. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  26. Szulc, M., Łyskawa, J., Wawrzyński, P.: A framework for reinforcement learning with autocorrelated actions. In: International Conference on Neural Information Processing, pp. 90–101 (2020)

    Google Scholar 

  27. Sünderhauf, N., et al.: The limits and potentials of deep learning for robotics. Int. J. Rob. Res. 37(4–5), 405–420 (2018). https://doi.org/10.1177/0278364918770733

    Article  Google Scholar 

  28. Tallec, C., Blier, L., Ollivier, Y.: Making deep q-learning methods robust to time discretization. In: International Conference on Machine Learning (ICML), pp. 6096–6104 (2019)

    Google Scholar 

  29. Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the Brownian motion. Phys. Rev. 36, 823–841 (1930). https://doi.org/10.1103/PhysRev.36.823

    Article  MATH  Google Scholar 

  30. Wawrzyński, P.: Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Netw. 22(10), 1484–1497 (2009)

    Article  MATH  Google Scholar 

  31. Yu, C., Liu, J., Nemati, S., Yin, G.: Reinforcement learning in healthcare: a survey. ACM Comput. Surv. 55(1) (2021). https://doi.org/10.1145/3477600

  32. Yu, H., Xu, W., Zhang, H.: TASAC: temporally abstract soft actor-critic for continuous control. CoRR abs/2104.06521 (2021). https://arxiv.org/abs/2104.06521

  33. Łyskawa, J., Wawrzyński, P.: ACERAC: efficient reinforcement learning in fine time discretization. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2022.3190973

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakub Łyskawa .

Editor information

Editors and Affiliations

Ethics declarations

Ethical Statement

This work does not focus on processing personal data. The novel solutions presented in this paper cannot be directly used to collect, process, or infer personal information. We also believe that reinforcement learning methods, including SusACER, are currently not viable solutions for control processes used for policing or the military. This work does not have any ethical implications.

Hyperparameters

Hyperparameters

In this section we provide hyperparameters used to obtain results in the Sect. 6. Table 3 contains common parameters for the offline algorithms, namely for SusACER, ACER and SAC. Table 4 contains shared parameters for SusACER and ACER algorithms. Tables 5 and 6 contain hyperparameters for SAC and PPO, respectively. Table 7 contains environment-specific reward scaling parameter values for the SAC algorithm.

Table 3. Common parameters for offline algorithms (SusACER, ACER, SAC).
Table 4. SusACER and ACER hyperparameters.
Table 5. SAC general hyperparameters. For environment-specific hyperparameters see Tab. 7
Table 6. PPO hyperparameters.
Table 7. SAC reward scaling.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Łyskawa, J., Wawrzyński, P. (2024). Actor-Critic with Variable Time Discretization via Sustained Actions. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14447. Springer, Singapore. https://doi.org/10.1007/978-981-99-8079-6_37

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8079-6_37

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8078-9

  • Online ISBN: 978-981-99-8079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics